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PREFACE 


This report contains full documentation of the 
POLSIM model constructed as a project of the Effectiveness 


Division, Planning Branch. 


The present report is essentially organized into 
three tiers. The first comprises chapters one, two, seven, 
eight and nine. These give the reader an overview of the 
nature of the model together with some test results and 
certain conclusions based on the project's experience. The 
second tier is represented by the full text of the report 
(excluding appendices). This takes the reader into the 
detail of the model's structure and the manner in which it 
was estimated. The third and last tier is represented by 
the appendices which treat certain structural and estimation 
questions in greater depth, document the parameter values of 
the current model, present validation results and list model 


software. 


The model has been run at Statistics Canada on an 
IBM 370-165 computer. Initial year input tapes for 1967 and 
1971 are stored in the Statistics Canada Tape Library together 
with synthesized or simulated population tapes for the years 
1968, 1969, 1970, 1971 and 1972. At the moment of writing 
this report, the model is run from a source deck but will 


shortly be available in a more efficient, compiled form. 
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1, INTRODUCTION 


Motivation for the Development of POLSIM 
SE eve t opment OF POLSIM 


The public sector affects both the level and 
distribution of national income. Until fairly recently 
concern has largely been focused on questions of the 
level. This has been reflected in Canada in the 
development of a number of aggregate economic models 
which have tended to concentrate on the business sector. 
Perhaps the best known of these are RDX2 and CANDIDE. 
Although these models do provide treatment of both the 
household and government sectors, the manner in which 
this is done does not lend them to the study of the 
income distributional consequences of government programs. 
This is the case because the distribution of income is 
most meaningfully considered in relation to individual 
households or families, the recipients of national 
income, and these models do not maintain sufficient 
household or family detail. It is clear that any 
exercise designed to elucidate distributional issues 
must begin with the household sector portrayed in great 


particularity, preterably at Une ievet of the individual 


person. 


Of course, given the availability of elaborate 
sets of microdata, it is possible to develop fairly 
simple models that enable one to pose "what if" questions 
for some past period. But this kind of static analysis, 
although useful, does not allow one to come to grips 
with a host of questions bearing on income distributional 


issues which are crucially dependent on time. To be policy 
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relevant; /it“is*dlso necessary to have the ability to 
forecast the workings of a large number of factors, 
economic and demographic, that bear on the household 


sector and determine the distribution of income. 


It is also necessary to have the capacity to model 
government programs as comprehensively as possible, and 
in considerable structural detail. It is not possible 
to know what any single program will do if we are 
unable to situate that program in the context of an 
environment produced, in part, by a number of other 
programs. Further, it is not enough to model the 
general direction of the effects produced by a given 
program as a whole. We are concerned to understand the 
Significance of particular program designs. To do this 
we require the ability to test the effect caused by an 


alteration of internal program components. 


Microdata Simulation 


The term simulation is usually used to describe 
techniques related to the construction of models or 
simulators whose operations are intended to resemble 
the behaviour of actual or potential operating systems. 
Microdata simulation involves the construction of 
simulators intended to function in the same manner as 
operating systems comprised of a large number of basic 
components or decision unvcs.© | “carryingeout chirs 
form of simulation one may employ the technique of 
endowing microcomponents with behavioural functions and 


of deriving the consequences (in terms of individual 
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behaviour) of different environments (specified by a 
set of independent variables) on the microcomponents. 
Or, one may employ the technique of stochastic events. 
In this case microcomponents are not endowed with 
explicit specifications as to their behaviour under 
differing environments but rather are thought to behave 
as particles governed by specified laws of chance. 

That is, the causal relationships that actually determine 
behaviour are considered only implicitly, either 
because they are too complex to be modelled, or because 
insufficient data exists to estimate the specified 
model. One is only able to observe that the micro- 
components reside ina certain “state” for a period of 
time and then move to other "states". The precise laws 
governing individual movement are unknown. What is 
known is that if a large number of "state" changes or 
movements are observed, the behaviour in question can 


be described as if it depended solely on chance. 


Events which are assumed to be governed by chance 
are described by probability distributions. The fact 
that we are able to specify these distributions means 
that we know something about the process in question, 
although we can't fully explain the causal laws under- 
lying it. Changes in the environment take the form of 
changes in the probability distributions which govern 
the chance outcomes. In the limit, if we understood 
the process completely, we could specify these probabilities 
as either zero or one for particular individuals, 
depending on the environmental factors. That is, we 
would have learned enough about the process to eliminate 


all of the randomness and it would become completely 


deterministic. 
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Overview of the Model 
SE NS Nodet 


The probablistic analog of a deterministic causal 
process is the first order Markov-chain. In this kind 
of process, the future state of the world is completely 
independent of time periods preceding the present. In 
the present context, then, micro-component behaviour 
would be described completely by its present state, and 
the Markov-chain probabilities that relate the present 
to the future. In the broadest possible terms this is 
the POLSIM Model: it is a Markov-chain simulation of 
individual demographic, labour force, and market income 


behaviour. 


More specifically, POLSIM is an annual microdata 
model of the Canadian household sector. The basic 
component of the model is the individual person. 
Individual persons may be associated into nuclear 
family units in the model but the prime focus is always 
maintained on the individual. The model receives, as 
exogenous input, a specification of the native Canadian 
population for some year. This specification is made 
in terms of a number of characteristics (a state vector) 


for every individual in the initial population. 


The individual state vector is comprised of three 
different sets of characteristics: demographic (e.g., 
age, Sex, etc.), activity (e.g., weeks employed, weeks 
unemployed, etc.) and income (e.g., annual wages, 
annual dividend income, etc.). Most of these character- 
istics change over time. One of the main functions of 


the model is to effect these changes intthemigner of 
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the individual's particular circumstances, as portrayed 
by his state vector immediately before the change, and 
conditions prevailing in the socio-economic environment. 


Two of these Changes, death and emigration, in effect 


destroy an individual state vector. 


A second main function of the model is to introduce 
immigrants into the model "population". This is 
essentially a problem of completely constructing an 
individual state vector for each immigrant. This 
problem does not arise in the case of the native Canadian 
population, because for this group the state vector is 
given (see Chapter 2). In the case of immigrants, on 
the other hand, no such information is available. 

Last, the model functions to compute the effects of a 
range of government programs. At the moment these 
effects are mainly restricted to changes in financial 
flows, but the possibility exists to extend this 


treatment to comprehend other effects as well. 


The POLSIM model is constructed as a number of 
connected blocks: (i) immigration, (ii) demographic, 
(iii) activity status, (Av) market income, and (Vv) 
policy. Each of these blocks contains a series of 
processes or transformations which operate on individual 
state vectors to produce annual change. These processes 
are ordered within blocks. For example, within the 
demographic block an individual must first pass through 
the "survival" process before being considered for the 
"emigration" process, etc. And similarly, the blocks 
are themselves arranged in the order listed above 


(illustrated in Figure 1.1). That is, the model requires 
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Figure 1, 


THE POLSIM MODEL 
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that individuals pass through the demographic block 
before entering the activity block, and so on. This 
EBeeaument «is, of course, artificial. In the rea. word 
these processes do not operate in some established 
sequence but rather Simultaneously and ee rslnn Sticliog: 
However, no great violence is done to reality provided 
that we derive consistent measures of the probabilities 
which "describe" these processes and arrange the process 
such that those which affect the outcome of other 
processes precede the latter in sequence. The question 
of consistency of probabilities and the sequence of the 


various processes is treated in Appendix A.l. 


The purpose of the immigration block is to open 
the model to international in-bound migrants. This is 
done, as we have already related, by fabricating the 
appropriate number and kind of individual state vectors 
and introducing these into the model population. Ina 
simulation of population and other changes from one 
year to the next, the immigrants of year t+l first 
appear in the model at the beginning of year ttl. That 
is, all immigrants are assumed to arrive on January l. 
The immigration block receives two inputs exogenous to 
the model: the aggregate annual number of immigrants to 
Canada (e.g-, 150,000 individuals), and the aggregate 
rate of unemployment that is assumed to obtain in 


January of the year being simulated. 


The demographic block is concerned with the problem 
of updating both the native and current immigrant 
populations through the processes of: (i) emigration, 
(71 )e deaths. (112) Bp LTC ya divorce/separation, (v) 


marriage, (vi) fertility, (vii) family dependency, and 
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(viii) internal migration. The demographic block takes 
as exogenous input the whole set of individual state 
vectors which collectively describe the native Canadian 
population in some given year t. These are passed 
through the block one by one, together with the output 
of the immigration block (i.e., the individual state 
vectors of all current year immigrants), and updated by 
the processes mentioned above. The demographic block, 
then, produces an integrated (i.e., native plus current 
immigrant) population possessing partially updated 
individual state vectors (i.e., updated in their 


demographic characteristics). 


The activity block accepts the set of partially 
updated state vectors from the demographic block and an 
exogenous input consisting of thirteen monthly Canadian 
aggregate unemployment rates. Here is the first instance 
in the model where the socio-economic environment, as 
distinct from the particular characteristics of the 
individual (as described by his state vector) influences 
the nature of the changes to which an individual is 
subjected. The activity block consists of three 
processes. The first, a monthly labour force model, 
places individuals in one of four possible states: 
school, other non-labour force, employed, and unemployed. 
Here the time frame of the model shifts, temporarily, 
from one year to one month. The second process, 
educational attainment, updates the education status of 
individuals who have spent the requisite time in school 
during the simulation year. The last process, type, 


determines how new labour force entrants (i.e., persons 
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who either leave school or the non-labour force and 

join the labour force during the simulation year) owas. 
relate to the labour force. This designation segregates, 
for the purposes of the model, the labour force into 

two groups - those persons who will be subject to the 
risk of unemployment and those persons who will not. 

It also determines whether or not the type of employment 
the person enters will be such as to allow him a private 
pension on retirement. The activity block, then, does 
its job by further updating the set of activity 
characteristics for: malinof themindividuatermam the 
integrated population.« Whe final’ ipdate, that of 


income characteristics, can now be undertaken. 


The market income block consists of five processes, 
one for each of the different kinds of market source 
income that are identified in the model. These are 
wages of persons subject to unemployment, income from 
employment of persons not subject to unemployment, 
dividend income, other property income, and private 
pension income. In the case of wages of persons subject 
to unemployment the market income block generates 
weekly wage rate change. Annual wage income is then 
calculated as the product of weeks employed and the 
updated weekly wage. Persons not subject to unemployment 
simply have their annual employment income updated 
directly. In both of these cases the model calculates 
wage change in real terms and accepts an exogenous 
specification of wage inflation to convert to money 
wages. The remaining kinds of ancome: are altered ona 


pure money change basis. New entrants to the work 


force are endowed with initial wage rates or annual 
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wages in a manner which reflects their accummulated 
human capital. After these initial endowments, further 
changes in employment or wage income are effected in 
the same manner as with those persons who are. already 
in the work force. In sum, the market income block 
accepts partially updated state vectors from the 
demographic and activity blocks and an exogenous ly 


posited wage inflation rate; it then completes the 


update of individual state vectors. 


The model population is now fully updated for one 
year. All changes that will occur between year t and 
year t+l have been made. It remains to pass the population 
through the policy block in order to calculate the 
effects of defined government programs for the year t+l. 
The policy block of the model consists of a number 
of algorithms designed to simulate certain of the 
effects of these programs, (e.g., taxes paid, welfare 
benefits received, etc.). These algorithms may be 
thought of as a series of processes which accept the 
state vectors of individuals comprising a single 
family, together with a specification of a particular. 
government program, as inputs and which then calculate 


program effects as outputs. 


Running the Model 


One pass through the entire model produces an 
update of the population for a single year. A succession 
of passes, with a given set of time labelled exogenous 
inputs, eaices a unique time track of the population. 


Any change in any one of the several exogenous inputs 
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will result, of course, in another time track. However, 
in terms of actual computation, it is only necessary to 
create a completely new time track from the beginning 
of the block in which the exogenous input has been 


altered. This results in a considerable saving of both 


effort and computing cost. 


The demographic block may be run separately to 
describe a number of "demographic" tracks depending on 
the assumptions made with respect to aggregate immi- 
gration. These "demographic" tracks may in turn be run 
through the activity block to generate a number of 
"demographic-activity" tracks, depending on the assump- 
tions made with respect to aggregate unemployment. 
Finally, one or more of these "demographic-activity" 
tracks can be expanded into a larger number (dependent 
on the number of assumptions made with respect to wage 


inflation) of “demographic=activity-ineome”™ cracks: 


Examples of Simulation Experiments Feasible Using POLSIM 


In section 1.1 above we mentioned that one of the 
main motivations for the development of POLSIM was the 
desire to evaluate the distributional consequences of 
government programs (mainly tax and transfer programs). 
At the moment of writing this report, a fairly large 
number of government programs have been modelled (see 
Chapter 8 below). The redistributional consequences of 
any complex of these programs may be examined in the 


context of one or more time tracks. 
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For example, the cost and initial incidence of a 
given negative income tax could be examined over the 
period, say, 1973-1976 under different assumptions in 
respect of immigration, unemployment and wage inflation. 
Or, the yield and initial incidence of the indexed 
personal income tax may be estimated for the same 
period and under the same set of assumptions. Alterna- 
tively, we may be interested in the interactions between 
two or more programs. Say, for instance, the number of 
persons subject to cumulative marginal tax rates of 50% 
or more from all sources. Another simulation experiment 
could be the determination of the persistence of 
individual poverty over time or the relative efficacy 
of direct transfers as opposed to other measures for 


poverty alleviation. 
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2. INITIAL YEAR STATE DESCRIPTION 


In section 1.3 above we indicated that POLSIM 
requires the specification of a model “population” for 
some given year. In the present chapter we shall 
describe this initial population more fully. We will 
begin by looking more Closely at the composition of the 
individual state vector to see the particular characteristics 
which it contains. We shall then briefly consider a 
number of alternative data sources for initial population 
specification, and end the chapter with a discussion of 


the initial model population. 
The Individual State Vector 


In section 1.3 we described in very broad terms 
the nature of the individual state vector, that is, the 
fact that it contains demographic, activity and income 
characteristics. The complete vector in its most 
detailed form contains 23 characteristics as shown 


below: 


a Previous Year Family Unit Number (LYUNIT) 
Zz Family Unit Identifier (UNIT) 

Bo Province (PROVIN) 

4. Size of Family (SIZE) 

5 Census Family Relationship (DEPNCY) 

6. Marital Status (MSTAT) 

7. Age (AGE) 

Bis Sex (SEX) 


oe Major Source of Income (MAJSIN) 
LO; Weeks in School (WKSCHL) 

Ae Weeks employed (WKEMP) 

Le Weeks unemployed (WKUNEM) 


shes Weeks in the non-labour force (WKNLF) 
Lea. Education (EDUCTN) 


ibe. April Activity Status (YRACT) 
16. Weight (WEIGHT) 

i ee Employment Category (TYPE) 
18, Employment Income (EMPINC) 


19 Interest and Other Investment Income (INTRST) 
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20x Dividends (DIVDNS) 

CALS Retirement Pension, Superannuation, and 
Annuities (RETIRE) 

Bes Other Money Income (OTHER) 

Bas Total Income (TOTAL) 


The names of most of the characteristics listed 
above explain the meaning of the Particutar eghagacteristic. 
Some, however, are less evident. In Ppanticular, 
"Family Unit Identifier" is simply a unique number 
which all members of a given nuclear family have in 
common. "Weight" is the sample weight which an individual 
has. This is explained more fully in section 2.3 in 
the discussion of the initial model population. The 
entire individual state vector is described in greater 


detail in Appendix A.2. 
Qe2~s Alternative. Sources of. Initial vYear-Data 


A number of sets of household sector microdata do 
exist which could lend themselves to the specification 
of the model population for some initial year. We will 
comment very briefly on these from the point of view of 
population coverage, the nature and quality of data and 


the frequency of issue. 


2.2.1 Department of National Revenue (Taxation) 


Tax Analysis Data Base 


The Tax Analysis Data Base is an annual stratified 
sample consisting of 1.25% of filed Tl Short and Tl 
General tax returns. The Tax Analysis Data Base rs 


naturally, heavily disposed toward information concerning 


income, deductions, exemptions and other income taxation 


data, but certain other data are also carried. 
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The annual nature of this DNR data set means, of 
course, that it is very timely. Microdata projections 
can be readily checked. And, the possibility of 
continuous updating of the initial year is extremely 
useful for the accuracy of microdata projections. 
Furthermore, the quality of income data for upper 
income level persons is probably superior to any other 
available. However, two serious limitations attach to 
this data set. First, the population coverage is 
obviously quite incomplete. Only those persons who 
file income tax returns are represented. This means 
that many low income persons are not covered. Second, 
sample records are those of individual persons as 
distinct from families. Although it is possible to 
infer something about the family of the tax-filer, it 
is not possible to identify two or more tax-filers who 


belong to the same family unit. 
2.2.2 Unemployment Insurance Commission Data Base 


The Unemployment Insurance Commission Data Base is 
a 2% sample of all persons with social insurance numbers. 
It consists of data describing the demographic, financial, 
and employment characteristics of approximately 2% of 
the Canadian working population. The sample comprises 
approximately 250,000 individual records, and was 
compiled from two main sources: Statistics Canada and 
the Department of National Revenue. The Statistics 
Canada files contained data derived from UIC administrative 
records, as well as information on occupation and 
The DNR records supplied income information. 


industry. 


Data sets for persons who are not unemployment insurees 
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or who do not file income tax returns are not complete. 


The data base contains information for the years 1965- 


GS Spa 


The UIC data base suffers from the same gearmeeees 
as does the DNR data base. Population coverage is 
restricted to persons with social insurance numbers, 
and family records cannot be inferred from the given 
individual records. Tn addLeion, there @s Leeele 
prospect that this data base will be updated on a 
regular basis, since it was orginally developed for a 
special purpose task. Tt is thus not a promising 
source as a base population for a micro-simulation 


model. 
Da See chne! Census 


The census is of course the most comprehensive 
data base in existence in Canada. Unfortunately, for 
purposes of micro-simulation, it has several defects. 
First, it is too comprehensive.” It 1s juste noc pracercal 
to simulate an entire population. Instead, what is 
wanted is some sample of the population. A sample Be 
the census records could be taken, of course, but this 
would involve considerable expense, problems of weighting, . 
and extensive computer programming. Second, the census 
is untimely. Since it is only taken every 10 years, it 
rapidly loses its usefulness as time from the previous 
census elapses. And finally, it is unwieldy. Since 
any simulation model can at best focus on only a limited 
number of variables, extensive software would have to 
be developed simply to cut the individual census records 
down to’manageable size. For all of these reasons, the 
eal source as an initial year model 


census is not an id 


population. 
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mee, 4 Survey of Consumer Finance 


The data carried in the Survey of Consumer Finance 
is extensively documented in Appendix A.5. In the 
recent past the Survey has been conducted every two 
years. In future, it will be taken annually, but every 
second survey will be small scale and of a specialized 
nature so as to make it unsuitable for initial year 
model population specification. For the present purpose, 
therefore, we may only regard each second survey as 


useful. 


The SCF has the advantage of more extensive population 
coverage than either of the DNR or UIC microdata sets 
mentioned above. Coverage extends to all of the population 
of Canada with the exception of (i) persons resident in 
the territories, (ii) persons resident on Indian reserva- 
tions, and (iii) persons resident in institutions 
(e.g., prisons, mental hospitals; etc.). The Survey 
carries a wide variety of individual income and other 
information (see the complete documentation of SCF data 
in Appendix A.5) capable of organization on either an 
economic or census family basis. The quality of this 
data is generally very high (see Appendix A.3 and 
Appendix A.4 for comparisons with other data), and Les 
availability every two years makes it timely. All 
things considered it seemed best to utilize the Survey 


of Consumer Finance as the source for the initial model 


population data. 
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The Initial Model Population 
a EE POpUtation 


In most cases the data for an individual's initial 
year state vector are simply taken directly from the 
Survey of Consumer Finance (see Appendix A.5 ee 
In some cases, however, it is necessary to derive this 
information from other sources. The first instance 
where this occurs is the case of persons under 14 years 
of age. The Survey of Consumer Finance does not record 
sex, education or activity status of these persons. 

Sex status must be assigned by simulation using the 

Monte Carlo technique and known probabilities. Education 
status and activity status, on the other hand, are 
inferred in simple fashion from age... That is, 16 a6 
assumed that the individual enters primary school at 

age 6 and proceeds mechanically through the various 
grades so that age 6 is equivalent to "grade 1" and age 


13 is equivalent to "grade 8". 


A similar problem with activity status exists for 
persons 14 years and over. If the individual is not in 
school according to the SCF data, his activity status 
(employment, unemployment, or non-labour force) can be 
inferred directly from his labour force status as recorded 
in the SCF data. If, on the other hand, the person is 
shown to be in school, his activity status is determined 
on the basis of age, province, and education level. 
People in high school are assigned grades according to 
age, the sequence related above being continued, 

(i.e., age 14 corresponds to "grade OF “and. SO-On)i.) sin 
allowance is made for the attainment of 


Ontario, 


grade 13; other provinces are assumed to have secondary 
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education up to and including grade 12. Persons in 
university are placed in either 6th year university or 
2nd year university, depending on whether or not they 


are shown by the Survey of Consumer Finance to possess 


a university degree. 


Each of the 23 characteristics, discussed above, 
taken together "describe" one individual. The entire 
Canadian population could in principle be "described" 
in the context of the model by a large number of individual 
state vectors - one for each individual in the population. 
This would, however, be a very inefficient procedure. 
A much more economical means of accounting for the 
entire population is by way of a representative sample. 
In this case, each individual state vector in the model 
stands for or represents a larger number of "identical" 


persons in the real population. 


The model requires, as initial input, a weighted 
sample of individual state vectors which describe the 
population of Canada for some year. At the time of 
writing this report ae have been working with two such 
initial years, 1967 and, 1971. . The. Survey of Consumer 
Finance undertakes a geographically stratified sampling 
of Canadian households in April of the year following 
the survey year (e.g., April 1968 for 1967 or April 
1972 for 1971). The sample included 37,985 individuals 
over fourteen years of age and in receipt of cash 
income in 1967 and 43,039 individuals aged fourteen 
years or more who were in receipt of cash income in 
1971. Over time, of course, as more current data 
becomes available, initial year state descriptions for 
more recent periods can be constructed. It is important 
to be able to continually update the initial year as a 


guard against the generation of inaccurate projections. 
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Individual state vectors are weighted by the 
Survey of Consumer Finance in order to calculate 
population estimates. These weights are unequal, 
running from 30 up to 3,000 in increments of 10. 
In order to conform to the logic of POLSIM, individual 
State vectors must possess equal weights. It was 
necessary, then, to adjust the SCF samples in such a 
way as to produce equally weighted records. We adopted 
a common weight of 50 (i.e. one individual in the sample 
represents 50 in the real population). All SCF records 
first had their weights randomly rounded to be multiples 
of 50. Next, with all record weights some multiple of 
50, we replicated each record a number of times equal 
to the quotient of the weight divided by 50. For 
example, if the original SCF record had weight 520, it 
was randomly rounded to either 500 or 550 which in turn 


yielded 500 10 or 550 = 11 identical records, each of 


Q 50 


weight 50. 


More explicitly, Jet the original SCF weight~be) Ww. 
We wish to randomly round this weight to a multiple of 
a positive integer m. We have then, W = mg + r where 
ry =.0,.1,-2<.., (m-1),,and g is some-positive. tnteger? 
W is rounded by Monte Carlo simulation to be either 
w- mg with probability f, or w= mg + m with probability 
(i=)... Tn erder- £07 be unbiased we require the expected. 


value of W.tolequal w. (/That,is.we Eequire, 
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which yields (1-f) = £ = Prob (W will be rounded to 
m 


= Prob (W will be rounded to mg). 
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For our case, m = 50, any integer W divided by 50 


will give a residual r = Oy; 1, 2) sane 49 endetnesratio 


= is the probability that this residual will be increased 
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The process of replicating sample state vectors 
increases the number of sample individuals from 79,479 
to 397,960 in 1967 and from 79,528 to 4257 864°2n @tore. 
This larger sample is useful for Monte Carlo simulation, 
of course, since simulation errors are thereby reduced. 
A potential difficulty, however, also attends the 
replication process. This stems from rounding errors 
in the process described above. The extent of this 
error can be assessed by comparing population estimates 
produced from the SCF unequally weighted and the POLSIM 
equally weighted samples. A comparison of Tables 2.1 
and 2.2. reveals that the rounding error is not great. 


(See also the more detailed tables of Appendix A.3). 


There remains the question of the adequacy of the 
base year data as a description of the Canadian population 
for the relevant year. An excellent simulation model 
will obviously produce bad projections) if the gnitial 
year state description is poor. A comparison of Table 
2.2 with Table 2.3 shown below provides a general idea 
of the adequacy of the initial population coverage. (See 
also the more detailed comparisons of Tables A3.5 and 
A3.10, Appendix A.3.) dn general, the: 1972 Sch cuceas 
a consequence our 1971 initial year model population, under- 
states the 1971 Canadian population by one percentage 
point... Pals understatement is worst in the case of older 
age groups (i.e. persons over eighty years of age). The 


Province of Saskatchewan population is, interestingly enough, 


also considerably understated. 
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Age 
Group 


ORD 
£O= 4 
fo22e 
25-44 
45-64 
65-95 


Source: 


Age 
Group 


0-9 
10-14 
15-24 
25-44 
45-64 
65-95 


TABLE. 2.1 


Survey of Consumer Finance Population 


of Canada, Nop ex Bi he: 


Male 


Female Total 
=== — 3,992,460 ----------- 3,992,460 
tcc tig Ne I 2,318,960 ----------- 2,510,960 
Io D2, OU b, 2390741.9.0 376917120 
Dl anal) 2p) DOr aOO 5p aon pe LU 
1,944,560 DOSS OHO 3,898,430 
800,620 OA 1610 1,714 , 180 


21,295,690 


Table A3.1, Appendix A3. 


TABLE 2.2 


1971 Initial Year Model Population of Canada 


Male Female Total 
2,022,150 947 7650 3,990,600 
Dad kg koh sty!) lap B on chars sne) 27313 700 
1952,900 1,940,000 3,092,000 
2 ie2, O50 25 ee 5, 4797300 
19457550 9527 000 3,098,150 

799,200 oa eae) 000) Let 00 
10,642,800 10,650,400 220 See0U 


TABLE, 2is3 


Census Population of Canada 


Cune Soya 

Age Male Female Total 
Group 

0-9 ZRUCey UL) Lp Oo yl a0 PUG Lo 
10-14 Lye, 450 Ly ee2 O25 Capa Be BGs) 
P5-24 2G one O Uy eke ip set aS) 4,003, 795 
25-44 Zin) theo 2, 06098,545 5,415,940 
45-64 L986 7425 2 USC, 0S 4, 0255500 
bs 131,809 62 ps5 1,744,400 
Mota TOF 79 Sy 370 LO (2, 950 PAM SY ops yee G) 


Source: Table A3.3, Appendix A3. 


The April labour force status of that portzon of 
the population captured in the Survey of Consumer 
Finances is reliably reported since the SCF is tied to 
the April Labour Force Survey. We can have, therefore, 
considerable confidence in these data though not so 
much in the number of weeks of employment, unemployment, 
etc. reported for the preceding year. Fortunately the 


latter data is not utilized by POLSIM. 
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Assessment of the adequacy of the Survey of Consumer 
Finances for purposes of measuring income can only be 
carried out at the most aggregate level. Table 2.4 
compares SCF income data with adjusted national accounts 
personal income data. So far as one can tell from this 
comparison total wages and salaries are fairly accurately 
captured. The other categories of income are more or less 
badly underreported. It is of interest that thea sscr 


performance is quite variable over time in this last respect. 


TABLE 2.4 


Comparison of Survey of Consumer Finance Income* 
Estimates of National Accounts Adjusted Personal Income 


(SCF as percent of Adj.NA) 


Item 1967 1971 
Wages and Salaries LO. 8 LO229 
Non-farm Income from 

Self-Employment Ld. 56 
Farm Income esas 82.4 


Interest, Dividends and 
Miscellaneous Investment Income 49.4 Ooo 


* Individual series. 


Source: Unpublished data, National Income and Expenditure 
Division and Household Statistics Branch, Statistics 


Canada. 
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oF THE IMMIGRATION BLOCK 


Purpose and Overview 


The objective of the immigration block is to 
increment the base population in a given year by the 
total number of immigrants that will arrive in that 
year. This entails two ‘distinet problems: the projyecticn 
of the total number of immigrants who are to arrive in 
the given year, and the synthesis of individual state 
vectors for each of these individuals. The present 
version of the model does not attempt to determine 
projections of the total number of immigrants. Such 
projections are simply assumed by the model to be 
exogenously determined. The immigration block is 
therefore concerned principally with the construction 
of individual state vectors for a pre-determined number 


of people. 


Since the arrival of immigrants is distributed 
over all 12 months of the year, there is some question 
as to when exactly they should be added to the population 
base. It is assumed, for purposes of the model, that 
immigrants who arrive throughout a given year will 
become part of the population base for the whole, year, 
This in effect means that all immigrants in a given 
year arrive on January 1 of that year, and thus ignores . 
the difficulty of handling people who are only present. 


in the population for part of the year. 


General Structure 


The general structure of the immigration block may 
be examined with reference to the flow chart in figure 


= een The model begins by reading in all of the constant 
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FIGURE 3.1 THE IMMIGRATION BLOCK 


READ ALL 
CONSTANT 
PARAMETERS 


READ ALL 
VARIABLE 
PARAMETERS 


TOTAL IMGRANTS. 
DNEMSLOY RATE: — — — > 
WEIGHT FACTOR 


DETERMINE THE 
TOTAL NUMBER OF 
PEOPLE IN EACH 
AGE-SEX—PROV 
MSTAT CATEGORY 


INSURE THAT NO. 
OF MARRIED MEN= 
NO. OF MARRIED 
WOMEN IN EACH 
PROVINCE 


INITIALIZE 
FAMILY UNIT 
NUMBER 


ENTER 
LOOP FOR 
ALL 10 
PROVINCES 


ENTER 
LOOP FOR ALL 
SINGLE MEN IN 

THE GIVEN 
PROVINCE 


CALCULATE 
THE STATE 
VECTOR FOR THE 
GIVEN SINGLE 
MAN 


ENTER 
LOOP FOR 
ALL SINGLE 
WOMEN IN THE 
PROVINCE 


CALCULATE THE 
STATE VECTOR 
FOR THE GIVEN 
SINGLE WOMAN 


LAS 

SINGLE 
WOMAN IN 
THE PROV. 


YES 


ENTER 

LOOP FOR 
ALL FAMILIES 
IN THE PROV. 


CALCULATE 
THE HUSBAND'S 
STATE VECTOR 


CALCULATE 
THE WIFE'S 
STATE VECTOR 


SET FAMILY 


FAMILY 


HAVE SIZE FOR 
CHILDREN CHILDLESS 
? COUPLES 


DETERMINE 
HOW MANY 
CHILDREN 


SET FAMILY 
SIZE FOR WHOLE 
FAMILY 


CALCULATE 
THE STATE 
VECTOR OF ALL 
THE CHILDREN 


LAST 
FAMILY IN 
PROV. 
? 


WRITE STATE 
VECTOR OF ALL YES 
IMMIGRANTS IN 

PROVINCE ON 

OUTPUT TAPE 


NO 


OUTPUT 
TARE 


 ——— TT? 


Tie 16 \ 
gree oe a | 
4adasi et? 


Tid i] 


DP 
——— 


oe 


Ai et Peal 
~~. ea =e 


yet Ww 


parameters. These include distributions of the immigrant 


population over age, sex, marital status and province, 
income distributions of new immigrants, and so on. 


The following three exogenous input parameters are then 


read in: 


Lee Wl - The weighting factor that applies to the base 
population. This is an integer; and in the present 


version of the model is 50. 


Bis RATE - This is the national mean unemployment rate 


in decimals for the year in question. 


Bie TOTIMG - the total number of immigrants that are 
expected to arrive in the year simulated (including 
children). The number will be an integer in the 


range Of 100:,000 to 1507, 0005 


Once all of the input parameters have been read 
into the model, the determination of the immigrant 
population itself can begin. The first step is to 
calculate the total number of people in each of 240 
age-sex-marital status-province classes. These classes, 
it should be noted, consist only of adults. The numbers 
of children in given classes are determined after 
married couples are formed. The number of adults ina 
given class is the product of the total number (of 
immigrants arriving and the probability of being in a 


given class, divided by the weighting factor. 
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The above procedure does not guarantee that the 
number of married men in a given province will equal 
the number of married women in that province. The 
reason for this arises from the fact that no data is 
kept by which married people can be linked to their 
spouses, and because the data that does exist is such 
that the number of married women immigrants always 
exceeds the number of married men immigrants. Two 
means Of dealing with this problem are possible. The 
first is to assume that the excess females consists of 
emther (a) widows who list their marital status as 
married, or (b) women who are planning to join husbands 
who arrived in earlier years, or (c) women whose husbands 
will arrive in subsequent years. It would then be 
necessary to identify in which group each of the excess 
females belongs, and to try to come up with some 
reasonable method of "disposing" of them from there. 
The second approach would be to adjust the raw totals 
so as to equate married males with married females. 
This latter method was the one adopted, because the 
problem affects only a small percentage of the immigrant 
population, and because inadequate knowledge precludes 


any reasonable handling by the more sophisticated 


approach. 


Since the data indicates that the largest group of 
married women falls into the second age group (20-30) 
the adjustment procedure was simply te delete the 


excess from this group. The number of excess married 


women was approximately 3000 (2.5% of the total immigrant 


population), so we are assured that this procedure does 


not constitute a gross distortion. 
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The family identifier number is then initialized 
at 1. These identifier numbers will be changed so as 
to, fall lintotthe Finbivad population sequence when the 
immigrant population is merged with the base population. 
The procedure ensures that all identifier numbers will 


be higher than those for the base population. 


The province loop is now entered. State vectors 
are determined for all the people in a given province, 
and these are then written out on Capes eS tater veccors 
for single males are calculated first. The procedure 
is to consider one individual at a time, calculating 
his entire state vector (cf. section 3.3 below). The 
next individual is then considered, and so on, until 
all single males have been dealt with. An analogous 
procedure is then repeated for all single females; and 
finally for all members of families. ~The family sioop 
includes the calculation of the number of children a 
given family will have, as well as these children's 


respective state vectors. 


Once all the immigrants in a given province have 
been created, their state vectors are written out, and 
the calculations are repeated for the next province. 
The final output from the immigration block consists of 
state vectors describing all of the immigrants in each 
of the ten provinces. These state vectors are now 
ready to enter the demographic block along with the 


remainder (i.e. non-immigrant portion) of the population. 
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Calculation of the State Vectors 
Se tate Vectors 


All of the 23 elements in the individual state 
vector (cf. section 2.1) must be assigned to each of 
the created immigrants. In many cases, assignment of a 
particular element is straightforward. (Sex and marital 
status for single men, for example.) In other cases, 
however, the elements must be determined in some 
probabilistic fashion, or else inferred from some other 
element or some other set of data. These latter cases 


will be discussed below. 


Age’ presents the first difficulty. We lknow toae 
each individual must fall into one of six age groups, 
and we know further how many individuals must be in 
each group (from the age-sex-province-marital status 
breakdown arrived at above). The procedure is to 
simply keep track of how many individuals have been 
added to a given age group, beginning with the first, 
until that group is filled up; and then to proceed to 
the next group, and so on. The age assigned will be 
the midpoint of the relevant group (1.6. 17, 25, 35, 
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It is assumed that the number of weeks employed 
for all immigrants is 52, and that the number of weeks 
in each of the other activity states is zero. . The only 
exception to this rule is for married women in the non- 
labor force. They are assigned zero weeks of employment 
and 52 weeks in the non-labor force. These assumptions 
are made only for the purpose of ensuring that weekly 


wage rates can be calculated if necessary in the Activity 
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Block. The actual number of weeks that the immigrant 


will in fact spend in each of the four states will be 
determined by the Activity Block, given his January 
aOCULvity astatus Lin the year sofvammigrationjorthe 
January activity status is determined for all single 
people (excluding children) and married men depending 
on the mean unemployment rate which is entered as 
exogenous input and the assumption that they must 
either be employed or unemployed. Married women, on 
the other hand, are assumed to be capable of entering 
the non-labour force category as well. Their activity 
status is determined depending both on the participation 
rate of married females and the January unemployment 
rates. The "Employment Category" vfior labour force 
participants thus determined is assumed to be Class B 
(TYPE-25) unless the person earns more than $9,000, in 
which case he is assumed to be Class A (TYPE-15).* The 
education category is determined by sampling from an 
education-age-sex distribution, and initial employment 
income is determined from the income-sex-marital status- 
age distributions (cf. data description esectionts 34): 
All other sources of income are assumed to be initially 


zero. The major source of income is therefore wages 


and salaries. 


In the family section it is necessary to create 
family units: husbands, wives, and children. A preliminary 


step toward creating these family units has already 


pe 


* TYPE is the variable which defines the way in which a person 
relates to the labor force. TYPE Ae persons are those who are 
assumed to never become unemployed and who will receive a pensior 
on retirement. TYPE 25 persons will also receive a pension on 
retirement but differ from TYPE 15 individuals in that they 
are assumed to be subject to unemployment. These difinitions 


are explained more fully in Appendix A.2. 


ETD ad? do sie 
Ajten YoLeye 72 matey a 
eae aurle COMES 

& welt @i- a7 Wieyespae) sontpavtell wa aabyalfung 

if (SORE ‘s eS. ew Cees nines aes neal wy te Sr RSERs 
oe 4 TR & eld dg ipapene ot wi Sens NT 
Gaaat® giitheee yf int Meawonl 6) quqyetes ap isasut 
spavyuldo@ i404 ridin 20s) — usp Nad eae 
were) Lnblranegicdn oe! os! daxt 4tnietetes 43 savor 
(8\¢ poles art seimeSrarem-.ko) ealeuditteid gps 
ileedind) Gd of Dumoree @-p Woah TS ‘ayeiuee retro EY 

erga, aviletsds: et mach 16 eondba acpi aft -o1s 


sRnisalen 


. i 

CHATS HF ERNEMSON CB LP | rmeieees v2 pant etd fh | 
caine, A -depe tye _ omer hat inbapdeud pating, ytlmas 

vdeo’ he aed witeet etpes?- gets qeitest) tnaves @ate 


= on eg a —-P 


ee ary, A) vA wd Pieeh@Gl aude piigicav off ee Be: 


Sgn S08 atone Ve ms VT Gore cl ett?'ca ebgaier 
‘MES » @ heehee ui4 yohea : IPUOlynenhe S®o+e0 feveas OF Dagiegaa 
So 10) bie) & oi 4) Gp9 8 aye uahapes’..(& “<6yT it eceviien SO 
yets gore ai oltehiapia) 6% Dv? exvireeti ib Jad Sregeaeses 

ONOIS cA 1° Puy i hae tae i)? Jeteocuse od of Qqteghe e55 


2.0 “Cheeta Geol vont Sentolqes eis 


, 
—— @ = = 


been taken. As discussed above, the number of married 
females in a given province has been adjusted so as to 
equate with the number of married males. The diffizculty 
now is to pair these couples up. Thais i¢ done solely 

on the basis of age: the first woman in the first age 
group is "married" to the first man in the first age 
group, the second to the second, and so on; until the 
last woman in the last female age group is paired with 
the last man in the last male age group. Because the 
number of males in a given age group does not necessarily 
equal the number of females in that age group, there 
will be some men married to women in lower (or higher) 
age groups, depending on the relative numbers in each 
group. But because the totals over all age groups in a 
given province are equal for both sexes, all people 


will eventually find a spouse. 


The method by which children are assigned to 
parents proceeds as follows. Children are assumed to 
fal anto three age groups: 0=9),, 10-147) endxio—1o. 
The mothers who are then allowed to have these children 
are themselves restricted to three age groups: 20-30, 
31-40, and 41-50. That is, all women over 50 and 
younger than 20 are assumed to have no dependent children. 
(It should be noted that this probably produces some 
distortion. But in the absence of data linking family 
records, any assumption will necessarily be somewhat 
arbitrary.) Llteis then assumed that mothers in a given 
age group will only have children from the same ordinal 
group. That is, mothers from the first mother age 


group will only have children from the first child age 


group, and so on. The number of children in the given 


age group that a woman will have is then determined by 
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sampling from a probability distribution. This number 
may be 0, 1, 2, or 3. If more than one child is assigned 
it is assumed that they are one year apart in age. The 
maximum age is assigned to the first child, and lower 
ages to subsequent children. These maximum ages are 
respectively 7, 13, and 18, depending on the age group 
being considered. The sex of the child is determined 
randomly, assuming half will be boys and half girls. 

The income of children is assumed to be zero. Family 


size for all the family members is then set after all 


the children in the family have been created. 


3.4 Parameter Estimation 


3.4.1 Demographic Variables 


The first set of parameters in the immigration 
block is the distribution of new adult immigrants over 
age, sex, marital status, and province of residence. 
More specifically, PEOPL (14°07 RS DM as her numerator 
adult immigrants of age I, sex J, and marital status K 


arriving in province L. The codes for each index are 


as follows: 


ACH) = (iS for eho 12 
2 for 20) =a50 
3 for 3) 240 
4 for 41 = 50 
5 for SL = 65 
6 tor 165 = 

SEX = J = 1 male 
2 female 


Marital Status = K = 1 single 
2 Married 
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This raw data is derived from the landing records 
of 1971 immigrants* and was obtained from the Information 
Analysis unit of the Programs and Procedures Branch of 
the Department of Manpower and Immigration. Given this 
data, and the total number of immigrants arriving in 
1971 (121,900), it is possible to derive the probability 
that an adult immigrant will be in a given age-sex- 
marital status-province class (see Appendix B for this 
distribution). At present, it is assumed these probabilities 


are stationary over time. 


A clear deficiency in. the calculation of. this 
distribution is that it is based on data from a single 
year, 1971. A superior method would be to estimate the 
distribution, from time-series data, as functions of 
economic conditions both in Canada and abroad and 
possibly other variables as well. This would be a 
major study in ,itself,, feasible in a, future version of 


the POLSIM model. 


3.4.2 The Assignment of Children 


The assignment of children to families is hampered 


by the fact that the raw data available does not link 


PS a ee ae 
ted from all immigrants upon 
their arrival in Canada. These contain information on the 
person's age, Sex, marital status, etc. The Depee tment a 
Manpower and Immigration collects all of these records, an 


stores them on magnetic tape. 


* Landing records are collec 
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children to their parents. This limitation is further 


exacerbated by questions of pure theory: Gn prancivole, 
one desires a distribution of different family sizes 
corresponding to differing ages of family members (for 
example, how many children and of what ages is a 35 
year old mother likely to have?). It is clear that the 
number of possible combinations here is very large; so 
large, in fact, that a distribution calculated on the 
basis of the immigrant population would probably lack 


statistical significance, even if 4t could be derived, 


The data that is available (from Immigration 
Landing Records) consists of a breakdown of the number 
of children arriving in 1971 by age, sex, and province. 
The problem is to assign these children to parents in 
some reasonable way. The method by which this is done 


is described below. 


As discussed previously, 1 18 Eiust mecessary to 


make the following assumptions: 


(a) wives older than 50 or younger than 20 have no 


dependent children; 


(b) all children in the 15-19 age group are assigned 


to mothers in the 41-50 age group; 


(c) all children in the 10-14 age group are assigned 


to mothers in the 31-40 age group; 


(a) all children in the 0-9 age group are assigned to 


mothers in the 20-30 age group; 
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(€) all women have no more than three children, and 


there exists some probability of their having a 


given number (0, de 26. FOL 


The problem then reduces to Calculating the probability. 
that a woman in a given age group and a given province 
will have a given number of children. That is, it is 
desired to calculate CHILD (I, J, K), the cumulative 
probability that a woman in province I and age group J 


will have K children. The indices are as follows: 
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In general, the way in which these probabilities 
were derived is as follows (the details and actual 


distributions are presented in Appendix B). 


Let Y = number of wives in a given age group 
in a given province. 

X = number of children in the province to 
be assigned to women in this age 
group. 

p, = probability of a wife in this age 
group and this province having i 


children. 
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ren) PY + 2poY + 3p 3Y = X 
where E is the mathematical expectation operator 
or p, + 2P5 + 3P3 = = (2) 


We thus have two equations in 4 unknowns. This 
problem is resolved by a priori determination of two of 
the probabilities by inspection of the data (see Appendix 
B). The remaining two probabilities are then determined 
by simultaneous solution of equations (1) and (2). In 
this way the entire array, CHILD (10, 3 


, 4), is determined. 


3.4.3 Participation Rate for Married Women 


the Labor force participation rate’ for married 
women is derived from the data on the "Intended Occupations 
of Male and Female Immigrants Admitted to Canada 1971". 
This data shows that the number of working married 
females was 5407 in 1971, out of a population cf 26,740 
married women. The participation rate is thus 5407 + 


26,740 = .2022. ~ (Cf. 197) Immigration Statistics). 


3.4.4 Employment Income Distributions 


The cumulative income distribution of certain 
classes of new immigrants is derived from the longitudinal 
survey of new immigrants that is currently being conducted 
by the Department of Manpower and Immigration. This 
survey follows three cohorts of immigrants through 
their first three years in Canada. These cohorts 
consist of 5,962 people from the 1969 immigrant popula- 
tion (a 3.7% sample), 5,338 from the 1970 population (a 
3.6% sample), and 5,368 from the 1971 population (a 


4.4% sample). Data from each cohort is collected in 
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four questionnaires. The first is sent out 6 months 
after the immigrant's arrival, the second at 1 year, 
the third at 2 years, and the fourth at 3 years. The 
questionnaires seek to determine the demographic 
characteristics of the immigrant, his employment and 
income experience, his social adaptation, and his 
residential mobility. Once the questionnaires have 
been returned, they are linked to the immigrant's 
Landing Record and his Immigrant Assessment Record (if 
the latter: exists)«»: All) of thist data gs Mhenvavaillable 


tor compiling particular distributions. 


The survey has not yet been completed, and consequently 
the data is at best tentative. Better distributions 
will become available some time in 1975, when the 


survey will be completely finished and tested. 


The income distribution compiled for the present 
versionsof the! model. is DOL(l, J, Ky, Lb). Tt is the 
cumulative probability that a person of sex I (1=M, 2=F), 
marital status J (l=single, 2=married), and age K will 


be in income group lL. 


The age groups are as follows: 
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The average income in each group is: 


750 
1500 
2500 
3500 
4500 
5500 
7000 
9000 
$12500 
$18000 
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The actual distribution is given in Appendix B. 


Be De = GOUucatLon Distributions 


The education distribution of new immigrants is 
Stratified by age and sex. BOOK (I, J, K) is the 
cumulative probability that a person of sex I (1=M, 2= 
F) in age group J will be in education class K. The 
age groups are the same as in all the other arrays 


described previously. 


The education classes are: 


Kee al Completed Elementary 
Ps Some High School 
3 Completed High Scool 
4 Some College or University 
5. University Degree 


Like the income data, these distributions are 
derived from the longitudinal survey of new immigrants. 


The actual numbers are given in Appendix B. 
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Validation of the Immigration Block 


The validation of the immigration block consisted 
of an attempt to simulate the 1971 immigrant population. 
The total number of 1971 immigrants (121,900) was entered 
as exogenous input and then the simulated population was 
produced. This total of 121,900 corresponds to a total 
of 2,438 simulated individuals at a weighting factor of 
50. The number of histories actually created was 2,404, 
the difference being attributable to the fact that the 
number of children created in any given family is determine 
probabilistically and the fact that the number of married 
women was slightly reduced so 3 to equate with the number 


of married men. 


Once all of the individual stats vectors were 
calculated various distributions were derived from these 
simulated vectors and then compared with the corresponding 
actual distributions. A summary of these comparisons 
is presented in table 3.1, in which the immigrant popula- 
tion, both simulated and actual, is broken down by 
Marital status and province. As the table indicates, the 
simulation performed quite well. The more detailed tables 
in Appendix B, which compare other marginal distributions, 
also demonstrate that the simulation is on the whole 
quite successful. It should be noted, however, that this 
does not imply that the simulated joint distribution over 
all 23 state variables will be in close agreement with 
the actual joint distribution over these variables. The 
reason for this is that even if all the marginal distribu- 
tions were independent of one another, there are simply 
not enough individuals created to adequately match all of 


the cells that 23 joint variables produce. (Note that if 
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summary Validation of the Immigration Block 


Province Married Single 
Simulated Actual Simulated Actual 
NFLD. 410 200 189 
i ANB gy a Re 82 0 41 
Noo. 814 482 550 
N.B. 510 200 247 
P20. 7394 7400 7394 
ONT. 23988 22000 21968 
MAN. 1980 1600 Ale O17! 
SASK 584 400 375 
ALTA 3454 2350" 2432 
B.C. 8000 | 5300 5324 
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each state variable was dichotomous, which is the minimum 
possibility, a cells would exist). But this is not really. 

a cdiiticulty for the present model. Since all of the marginal 
distributions are made conditional on as many relevant 
variables as existing data permits, it can be held with some 
confidence that the simulated population adequately represents 


the actual immigrant population. 
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4. THE DEMOGRAPHIC BLOCK 
4.1 The Demographic Block Model 
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The purpose of the Demographic Block is to update 
the demographic variables of the individual state 
vector. These variables are the individual's province 
of residence, his family size, his dependency status 
(whether he is a family head, a wife, or a dependent 
child), his marital status, and his age. The Demographic 
Block also determines whether or not a family will 
emigrate. Since the basic time unit of the POLSIM 
model is one year, the updating of these state variables 
is on an annual basis. The demographic block receives 
the set of individual state vectors which describe the 
native Canadian population for some year t plus immigrants 
for the year t+l. The problem is then to determine the 
necessary changes in the demographic characteristics of 


the population over the time period t to ttl. 


The Demographic Block is the first of three biocks 
in the POLSIM model which alter individual state vectors. 
The other two, the Activity Block and the Market Income 
Block, update the activity variables (weeks employed, 
education, etc.), and the market income variables, 
respectively. The Demographic Block differs from these 
latter two in three ways. First, it eliminates and 
creates individual records or state vectors. New 


records are created in the Demographic Block through 
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the simulation of births, while others are eliminated 
through the processes of death and emigration. The 
second difference is that the Demographic Block proceeds 
in two distinct phases. The other two blocks mentioned 
are somewhat simpler in this respect. They require 

only a single sequential processing of all individual 
records. Two phases are necessary in the Demographic 
Block because of the need to simulate marriage. Since 
individuals in the base year population must marry 

other individuals in the same population, it is necessary 
to first determine all of the "marriageable" individuals. 
Only then can these individuals be paired. The final 
difference is that the Demographic Block deals with 
family records, whereas the other two blocks work with 
individuals only. The reason for this is that demographic 
variables are inherently family variables; that is, 
dependency status, family size, etc., can clearly only 


be determined within the context of a whole family. 


The logical structure of the Demographic Block, 
and the details of the various processes it entails, 


are presented in the two following sections. 


4.1.2 Structure of the Demographic Block 


The Demographic Block consists of two distinct 
phases. The input to the first phase consists of the 
file containing the native Canadian population of the 
previous year together with the population of new 


immigrants for the current year. The first phase of 
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the block then produces three output files. The first 
putputytalescontains) the-records: of all individualsewho 
either emigrate or die during the year being simulated 
(i.e., the emigration and death registries). The second 
output file contains the records of all individuals who have 
been determined to be eligible for marriage during the 
Sinuladcion years ¢Thisetile,“calied the "Marriage Pools 
will be used in the second phase of the Demographic Block. 
The third output file contains the output records of all 
remaining individuals (all those who have neither died, 
emigrated, or been deemed ready for marriage). In phase one 
of the Demographic Block, then, all individuals who have not 
died or emigrated will have had all their demographic variables 
completely updated. Persons who will marry, on the other 
hand, are updated in all respects except for "pairing" and 


their province of residence. 


The second phase is exclusively concerned with persons 
in the marriage pool and determines "who will marry whom". 
The individuals who had previously been recorded in the 
marriage pool (i.e., the second file mentioned above) are 
formed into couples on the basis of age, province of residence 
and education. The demographic characteristics of these 
couples, with the exception of province of residence, have 
all been updated in phase one and phase two now determines 
the province of residence of the new couples, in the same 
manner as is true of persons who are not marrying between t 
and til. (See section 4.1.3 below). All the couples in 
this "Marriage Pool" file are then merged with the records 
in the thivrduoutput file’ (e.4 individuals who do not die, 


emigrate or marry during the simulation period) of phase 


one. 
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The basic method by which the individual state 
variables are updated is Monte Carlo simulation. 
Conceptually, this method is quite simple. Consider a 
married woman of age 17. The probability that she will 
give birth to a child within a period of a year is .45. 
If we wish to determine whether or not this woman will 
Gave Ditehwto a child during a fee year, we proceed 
as follows. We first choose a random number in the 
gntexvals{(0, 1). If this number is less than or equal 
to .45, we decide that this particular woman will have 
a child during the simulated year. If the random 
number is.greater than .45, she will not. All of the 
stochastic decisions in the POLSIM model are made in an 


analagous manner. 


The logical structure of the Demographic Block can 
be seen in more detail with reference to the flow chart 
in figures 4.1 and 4.2. All decision processes that 
are to be resolved stochastically by the above Monte 
Carlo method are represented in the flow chart by 
hexagons. Deterministic decisions are indicated by 


the standard diamond branches. 


The model begins by reading a family unit from the 
Initial Year Tape. The first decision to be made is 
that of whether or not the individuals in the family 
will survive. Survival is assumed to be an individual 
matter. All individuals within the given family are 
processed one by one. If the Monte Carlo procedure 


determines that the person will die, he is recorded on 
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FIGURE 4.1 — DEMOGRAPHIC BLOCK FLOW CHART 
(PHASE 1) 
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FIGURE 4.2 — DEMOGRAPHIC BLOCK FLOW CHART 
(PHASE 2) 
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Using N(s,a,e) population of heads, by sex, age and 
education, prepare a list, L(s,i), which indicates that 
the i-th head of sex s will marry the L(s,i)-th 
head of opposite sex. If L(s,i) «0 it means that 
there was not available mate for head (s,i). 
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the death registry, and his record is not considered further. 
If nobody in the family survives, another family record is 
read from the Initial Year file. Persons who survive are 


aged by one year and then proceed to the birth process. 


Births may be Lequtimate’ or Glleqitimate® UE the woman 
Ms Tearried, “and it is détermined that’ ’she witll thave a! child, 
che child’ issimply added’ ‘to the given family amit. 1f, 
however, she is not married and she is not the head of a 
family unit, she and her baby are assumed to form a new 
family unit; and, of course, she becomes the head of this 


new unit. 


The divorce process is now commenced for all families 
containing both a head and a spouse. If divorce is determined 
BO -OCCUuL, “ene "family is “splitcimr two. © ALL "children fare 
assumed to go with the mother into a new, separate family 


unit with the mother as head. 


All non-married individuals, excluding those who have 
become divorced in the current simulation, are now tested to 
determine whether they will get married. If this event 
occurs, the person is declared to be the head of a new 


family “unit. 


All individuals in each of the original families who 
are dependent (i.e., neither heads nor wives) are now tested 
to see whether they will leave home or not. If this event 
occurs, the individual is declared to be the head of a new 


family consisting of himself alone. It is clear that each 
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of the original family units might generate a whole set of 
new family units through marriage of dependents, divorce, 
Broth of ane) legitimate child, or. as a Onsequence of a 


dependent leaving home. 


For all families it is necessary to determine whether 
or not they will emigrate. This is considered to be a 
Palit y OSCiSiOn,, rather than an. individual, decision. . Aur 
families that do emigrate are recorded on the emigration 


registry, and are not considered further. 


Those family units that do not emigrate, with the 
exception of those who are going to be married in the 
current simulation, are now processed to determine their new 
province of residence. As mentioned above, province of 
residence of newly married couples is determined in phase 


two, after marriage has taken place. 


The final step of the first phase is to record all 
family units on the proper files. Those families whose head 
is to be married are recorded on the marriage pool tape 
(file #2), while all others are recorded on file #3. (Tee 
will be recalled that file #1 contains the death and emigration 
wegistries). At this stage file #3 contains all remaining 
families, with the exception of new families formed through 
marriage. All demographic variables of the individuals on 


\ 
the third file have been completely updated. 
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In the second phase of the Demographic Block those 
individuals who are to be married are processed to form 
couples. Once this is completed, all the new husbands 
and wives, together with any dependents they might 
have, are formed into family units. The province of 
residence of these families is then determined, and 
they are then merged with the families of file #3 to 
Creaceuthescompletelyoupdated: initial yearsfile.. Thais 
is the final product of the Demographic Block, which 


now becomes input to the Activity Block that follows. 


4.1.3 The Demographic Block Processes 


The previous section made reference to several 
"processes": birth, survival, marriage, etc. In this 
section we will elaborate on each of these. Before 
doing so, however, it will be worthwhile to briefly 
discuss the concept of a "process" itself. The term is 
very ambiguous, with a different meaning in almost 
every discipline. In mathematics, however, a process 
Consists. of ai variable which is a function of time,qsend 
whose evolution over time is governed by certain underlying 
rules which might be either stochastic, deterministic, 


or some combination of the two. 


Consider now the "death process" which we have 
referred to in the previous section. In light of the 
above definition, how is this process to be understood? 
All individuals can be assumed to exist in either one 
of two states: an individual can either be alive, or 
he can be dead. This suggests that we can define a 
"life tstate" variable, £(t), which is defined in-the 


following way: 
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7 If he is alive at time t 


0 otherwise 


Defined in this way, the variable 2(t) is a stochastic 
process. The time of death, which is what we must 
determine, is postulated to be random. We attach to 
every individual a probability P that he/she will die 
in a period commencing at time ge and terminating at 
time to T ht, where -the interval At “1s fiseds=stnetne 


POLSIM model, the time interval is one year. 


In what follows we do not rigorously define each 
process in the above pea fashion. We simply 
describe the processes of the Demographic Block ina 
very general way. It should be understood, however, 
that underlying each process is an involved set of 
mathematical relationships: there is an ensemble of 
time functions, a state space, and a function mapping 
each element of the ensemble at each point in time toa 
unique element of the state space. There is also a 
probability distribution over the*elements of the stare 
space. This ensemble of time functions represents the 
histories of individual members of the nation's population, 
and are not, of course, completely known. What is 
known is a cross section of this ensemble at a given 
time and probabilities implicitly describing the possibilities 
of the future cross sections. The microsimulation 
technique selects from these possibilities in a random 
way, on the assumption that at the aggregate level the 


cross section properties will be preserved. 


These relationships will only be implicit in the 


discussions that follow. 
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(a) The Death Process 


This is a stochastic process applied universally 
to all individuals. The descriptive probabilities are 
dependent on age, sex, and time. These probabilities 
could be regionalized as well, but current evidence 
suggests that regional differences are too slight to 
warranorthe vadded complexity. ““rulbsdetaplsias*tormthe 
estimation and evaluation of the death probabilities 


are given in section 4.2 below. 
(b) The Emigration Process 


This is a stochastic process which is applied to 
family units. We postulate that whole families, rather 
than individuals, are the units that emigrate. The 
descriptive probabilities depend on the characteristics 
of the head of the family, namely, his marital status, 
age, and sex. These probabilities are assumed to be 


stationary in the present version of the model. 
(c) The Birth Process 


This is a stochastic process that is appliéd*to 
females between the ages “of 14 and 49. -The’descriptive 
probabilities depend on the age of the potential mother, 
her marital status, and her birth parity (1-e.7 the 


number of children borne alive to date). 


(d) The Divorce Process 


Divorce is a stochastic process that applies to 


married couples. It takes place in two stages. In the 
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first stage the spouse who will be the possible initiator 
of the divorce is determined. That is, it is assumed 
that if a divorce is to take place at all, one of the 
spouses, and only one, must initiate the process. It 


is assumed that it is equally likely that this will be 


the husband or the wife. 


Once the possible initiator has been determined, 
the question of whether there will actually be a divorce 
is addressed. The probability of divorce depends now 
upon the sex and age of the initiating spouse. The 
decision is then made by the Monte Carlo procedure 


outlined earlier. 
(e) The Marriage Process 


Marriage is a stochastic process applied to individuals 
age 14 and over. It is decomposed into two sequential 
decisions. The first is the decision as to whether the 
person will get married or not. This is determined 
through a simple Monte Carlo procedure, where the 
descriptive probabilities depend upon the age, sex, 
region of residence, and marital status of the indivi- 
dual involved. The second decision deals with the 
problem of "who will marry whom". All individuals for 
whom it has been decided that marriage will occur are 
collected in a marriage pool. The second stage of the 


marriage process then matches the males and females in 


this pool. 
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The matching process proceeds as follows. The 
males and females are first partitioned into 300 age- 
education-province classes (ten provinces, ten age 
classes, and 3 education classes). Within each province, 
it is then necessary to designate a "choosing sex" and 
an ‘accepting sex". This is required because the 
descriptive probabilities for this process depend on 
sex. That is, we know the probability that a person of 
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age an); ‘sex. 7S and education “efi di mo riyes 


person Of age “a” and education "e”. . Since we do not 
wish to give priority to either sex, we alternate which 
sex is the "choosing" and which the "accepting". More 


precisely, the algorithm adopts the following technique: 


(i) Begin by designating males as the "choosing sex" 


and females as the "accepting sex". 


(ii) Then sweep all of the age-sex classes in the 
choosing sex set, and within each set select 
approximately 10% of the population. Each of 
these selected individuals is then matched to an 
individual in the "accepting" set, on the basis of 


the above mentioned probabilities. 


(iii) Next switch the "choosing" sex. If there are more 
couples to be formed, we proceed as in step (ii). 


If not, the process simply terminates. 


At this point we should mention that the matching 


probabilities are adjusted each time a certain sex, 
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age, education group becomes empty. It is clear that 
as matching progresses this is happening to the groups 
of individuals with same sex, age and education. When 
this happens for a certain group, it is evident that 
the probabilities for an individual of the opposite sex 
to marry somebody from the group in question must be 
adjusted to zero. On the other hand, the rest of the 
probabilities must be adjusted so that they represent 
probabitity distribution functions, i.e., they sum up 


to unity. The adjustment we make is proportional. 


Since it is extremely unlikely that there will be 
an equal number of males and females designated for 
marriage within a given province, it is necessary to 
marry some individuals from different provinces. This 
will still likely leave an excess of one sex or the 
other, since the Canada wide totals are unlikely to be 
equal either. All individuals who are not able to find 
a mate in the given year are then recorded on the 
updated file with their original marital status. It 
was originally intended to give these individuals 
priority for marriage in the subsequent year, but the 
small size of this "leftover" population does not 


warrant the added complexity that would be involved. 
(£) The Family Independency Process 


This stochastic process is applied to all individuals 
who are dependents. The object of the process is to 
determine whether or not a dependent individual will 
leave his family or continue to stay within phy Oe eg 


another year. The descriptive probabilities are dependent 
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on sex and age. In general, the probability of leaving 
home increases with age, then stabilizes and finally 


remains constant for all older ages, although it depends 


on sex. 
(g) Interprovincial Migration Process 


Like the emigration process, internal migration 
applies to family. units, rather than individuals. The 
object of the process is to determine whether or not a 
family will move to another province, and if so, which 
one. First, one determines whether or not the family 
will remain in their present region. By region we mean 
the standard geographic division of Canada: the Atlantic 
provainees, P.0O.; Ontario, Prairies and B.C. (Inci- 
dentally, more than 90% of the families stay in their 
Ped vonnOl nesidence.).. The descriptive probabilities 
for this decision depend on the region, age, and income 
of the family head. Second, if the family moves out of 
its current region of residence, their new region of 
residence is determined. The descriptive probabilities 
Of.this decision form a transition matrix (with diagonal 
elements zero) which provides transition probabilities 
from region to region. This transition is postulated 
to be independent of income and age. However, one 
should notice that the decision to move out of a region 
depends on age and income of the family head. Rd gus is gre i 
if the new region of residence is P.Q., Ontario or B.C. 
the new province of residence is determined and no 
further action is necessary. If, however, the new 
region is either Maritimes or Prairies a decision is 
taken as to which specific province the family will 


move. The descriptive probabilities of this decision 
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form rectangular transition matrices (one for Maritimes 

and one for Prairies) which provide transition probabilities 
from (old) province of residence to the restricted set 

of provinces (either Atlantic or Prairies). Again this 
transition is postulated to be independent of incomes 


and age. 


We assume that interprovincial migration of family 
units depend on the place of residence of the family 
head, his (or her) age and income. Statistical analysis 
showed that indeed all three characteristics are relevant 
to the probabilities in question. However, since the 
number of migrants is very small, compared to those who 
stay at their present place of residence, it was technically 
impossible to disaggregate transition probabilities by 
income and age. Instead we used the characteristics of 
age and income only to determine whether or not a particular 
individual will stay at his present region of residence. For 
those who move out of their region, their new residence is 


affected solely by their present place of residence. 
The Demographic Block Parameters 


All of the processes of the Demographic Block, as 
described in section 4.1.3 above, entail certain descriptive 
probabilities. This section discusses the sources Cr 
these data, and the way in which they were estimated. 


The actual parameters are listed in Appendix C. 


ny money vill i 


| a i eeenrenany sayin 
- wie: ara’ ben kt iauee peren eh nga feed Pt n . 
gatas: 23% oda ieuinags Busty ifs Qeandt-® ; 
MAL Qo0le pcan he deer ah onal 
th, eh Yan bi Riitasgs@ we 
ytidiiuibat mew oi avehninsd hip ast y | Zitpsary vet? 
vit Mss PM petse ia cAl aragt 2] ee prviines 
Bees pare | ‘Pejaarel aff’ Body’ whe Dike at? ns cas “aod 
+A Lita ris7) A fees re | sara ps a2 ive Sisyiis on ee 


¥ 


dae Sane St) aka te. neat seine als) Yyeta SOs st eB 4 


at =1600) ner wuq Vee) . Forbes aiady hq 2) avey oly seve? 
waoiatiecs Yo uaa San apeatd 2 (alee Htowhie 


an3 sa oid leet i> 


* uate erento ot) WW eaeeortrty ods Jo EEA 
ob Rqiseail where fausas «wv ink Bnprane ui “eddupeeh 
Sch npr aaity iprwenaat Poltiwe oe abhor | beneiieng 
Jnetentina Wie (er debe al Yas ott fae. aoaly eee 
=e ACS Pe tcrapeths ‘*Sesase 44k pestonsrey knee eee 


The probability parameters for all processes are 
assumed to be time-invariant with the exception of 
those for death. The parameters for birth and divorce 
are used directly as obtained from L. Stone of Statistics 
Canada. The parameters for emigration and marriage 
phase-I were estimated from raw data and care has been 
taken to be unbiased and consistent according to the 
sequential structure of the demographic block (see Appendix 
A.1). The death and family independence parameters were 
not estimated directly from raw data but by statistical 
inference. Finally, the parameters for marriage phase-II 
and interprovincial migration were computed in a straight- 


forward way. 


Almost all of the probability parameters were 
estimated from sources inde»sendent of our base year file. 
Their appropriateness for the model is checked in the 
validation section A.3 which again uses independent data 


for this purpose. 


4.2.1 The Death Process Parameters 


The source of the Death Process parameters is 
Vital Statistics Division of Statistics Canada.” 2he 
original data consisted of "Survival Ratios" for the 
Vearsi2956;,ebl, 66,<69y-7/4qrI2, and 84. A survival 
ratio is the number of surviving individuals ina given 
population at the end of a year divided by the total 
number of individuals in that population at the begin- 
ning of the year. These ratios can be identified as 
probabilities of survival (see Appendix A.1). The 


ratios are available by sex, and for single years of 
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age up to the age 100. Data for the first three years 
were historical estimates, while those for the last 
four were projections. The ratios are described ina 


Statistics Canada publication by W. Zayachkowskii.* 


As Zayachkowskii's paper suggests there is a time 
trend in these ratios. For young age groups this trend 
is upward and quite strong. For middle age groups 
there 1S practically no trend at all, and for older age 
groups the trend is weak and downward. We have developed 
a curve-fitting model in order to comprehend these time 
trends analytically. According to this model the 
survival ratio, S, for any sex and age is an exponential 


function of time: 


S=c- y exp{-a(t-1970) } 


where, C, Y, a are parameters dependent on sex and age. 


There are three constraints imposed upon the 


parameters a, c, and y as indicated below: 


Ge) a>O because if a<O then the ratio S would become 


unbounded as t increases. 


(2) (Gsc<) because c = S in the limit as time tends to 
infinity. In other words, our time horizon is not 
limited by our model except insofar as future data 


might reflect revolutionary medical discoveries. 


(3) O<c - y<1 because c - y=S for t=1970. 


* W. Zayachdowskii, "Mortality Projections for the 
year 1969 DBS Population". 
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Our model is not typical of econometric or regression 
models. Special procedures were followed based on the 
fact that if we knew the parameter a the model would 


become a simple linear regression. 


Consider the following. Let three different 
points in time be tt), to, t3, and let the observed 
survival ratios be S11, Sa, S3, at these points respectively. 
tour “model is “going to "Lit \exactly ‘at ‘these Ehnreée 


points we will have 


6p) 
\! 


c + y exp{-a(t1-1970) } 
So =z ct y exp{-a(t2-1970) } 


S3 =ct y exp{-a(t3-1970) } 


By subtraction we eliminate c, that is, 


nN 
I 

Ww 
ine) 
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y{exp{-a (t:-1970) }-exp{-a (t2-1970) }} 


ep) 
nN 
1 
dp 
wo 
él 


y{exp{-a (t2-1970) }-exp{-a (t3-1970) }} 


We apply the mean-value theorem of differential 
calculus by which, £(x1)-£(x2)sf' (z) (x1-x2) where z 


is a number between x; and x2. In our case, we assume that 
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by dividing we have, 


eiteo) tints tz—t3 
Nee and: Soak oe ‘seek 


Solving this equation with respect to a gives 


(ty-t2) (S,-S3) 
(t2-t3) (S1-S2) 


Therefore, in order that our curve passes through any 

Given Ser OL observations: (tt), 81), (és> Ss), (tan oe 

the parameter a must satisfy the above equation. Since 

we have seven observations which can be combined in 

Groups er 3, 1.e., 35. such combinations, we have ¢7) 0 eee: 
a35 estimates for the parameter a. We take as a final 
estimate for a the simple arithmetic average of these 


35 estimates. 


Finally, when the parameter a is estimated our 


model becomes a simple linear regression: 


1ép) 
! 


zc - YU, 


where, = exp{-a(t-1970) } 


The linear regression estimations of c and y are reported 
im Appendix C for each age (a = 0, Ly «ss; 99) and 


sex group. 


ae ae Emigration Process Parameters 


Emigration probabilities were estimated from two 


sets of data. The first consisted of counts of total 
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emigrants broken down by marital status (married, non- 
married), sex, and 5 year age groups (0-14, 15-19, .... 
70+) .* These counts were available for two periods, 
June 1968 - June 1969, and June 1969 - June 1970. The 
second set of data consisted of the Canadian popu- 
lation, partitioned into the same classes, for the 
years 1968, 1969 and 1970.** Averages of populations 
for the years 1968 with 1969 formed the population at 
risk, while averages of emigrants for the two given 
periods formed the frequency of successes for the 
Period Jenuary. 1, 1969. to January L,, 1970. 2ne ratio 
of emigrants divided by population gave us the estimates 


of the probabilities in question. 


The Birth Process Parameters 


The birth process probabilities were supplied by 
L. Stone of Statistics Canada.*** These probabilities 
are stratified by age of the potential mother (14, 
15, oc. 7,49), the number of children the mother alveady, 
has (0, 1, 2,. 3; 4, 5 or more) and on the Jeqitimacysor 


the potential child. 


Emigration totals were derived from unpublished data, 
Census Division, Statistics: Canada. 


Canadian population figures from Census Division, 
Stauistics Canada. 


Leroy 0. Stone, “Preparation of Some Demographic and 
Socio-Economic Data Inputs", Statistics Canada Internal 
Report. 
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4.2.4 Divorce Process Parameters 


These were also provided by Leroy Stone.* They 


are broken down by age and sex, and are listed ir. 


Appendix C. 


4.2.5 Marriage Process I Probabilities 


these are the probabilities that a given individual 
will get married. They are available by region (Maritimes, 
PeeeywoOncar1O, Prairies, B.C.), by marital status 


(single, other), by five year age brackets, and by sex. 


The marriage probabilities were derived from the 


following raw statistics: 


(1) The number of marriages that occurred in 1971 by 
age of bride, age of groom, marital status of 
bride (i.e. single, widowed, divorced, separated), 
and by province.** It was assumed that the 
province where the marriage was registered was 
also the province of residence of both the bride 
and the groom. By proper aggregation we were then 
able to obtain the distribution of marriages by 
sex, age, marital status (reduced to single and 


tother” sonly) Vand region. 


* Leroy O. Stone, "Preparation of Some Demographic and 
Socio-Economic Data Inputs", Statistics Canada Internal 
Report. 


** Marriage statistics were obtained from unpublished data 
of the Vital Statistics Division of Statistics Canada. 
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(ii) The population of unmarried individuals in 1971 by 
sex, age (5 year classes), marital status (single, 
Sther), and region. * 


The ratio of the first array to the second gave us 
the Marriage ratios. These ratios, adjusted by death 
ratios (see ‘Appendix A.1) gavelus the probabilities im 
question. It should be noted that for the marital 


status "single" there were no recorded marriages for 
people 50 years and older. In other words, we had 
available only 7 age classes (15-19 to 45-49) for this 
group of people. It was therefore assumed that the 


probability of a single person over 50 getting married 


is "zero'. 
4.2.6 Marriage Process II Probabilities 


From array (i) of the previous section we can 
obtain a cross classification of the number of persons 
who got married in 1971 by age of bride and age of 
groom. That is, we can determine N(a,, ay) which is 
the number of men in age group Any that married women in 
age group a). We identify 10 age classes,** and hence 


Nois 2 10270 Tatra. 


. Nie.) = » N(x, a) sum of entries in 
iw x=1 column ayy 


* Population statistics were obtained from Vital Statistics 


ee The ten classes are: 14-19, 20-24, 25-29, 30-34, 35-29, 
40-44, 45-49, 50-54, 55-59, 60+. 
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ui 
and N(a =) 


ep Nay, x) sum of entries 


i 

» 
a 
= 


= N(apy7 ay) /N (aye) (10x10 matrix) 


and F (ay, ay) = N(a51ay) /N(.,a (10x16 tna eer) 


w) 


Each element of M, say an element from row Ane is 
obviously an estimate of the probability that a man in 
age group a, will marry a woman in eje group aye 
Similarly, each column of F is an estimate of the same 


probabilities for women. 


These two matrices M and F can be combined into a 
3 dimensional array P(s,a,a') which will be the probability 
that an individual of sex s who is in age bracket a 
will marry an individual of the opposite sex who is in 


age bracket a'. 


We can derive a similar probability distributren 
relating the educational level of the two spouses. 
Specifically, we determine Q(s, e, e') which is the 
probability that an individual of sex s who is in 
education bracket e will marry an individual of the 
opposite sex who is in education bracket e'.* There 
are three education classes: grade 8 and under, grade 


9 to grade 13, and post-secondary. By assuming that 


in 


the two random variables age and education are stochastically 


independent, we can calculate the joint distribution. 


f(a,e,sja',e') = P(s,a,a'):Q(s,e,€") 


This distribution was derived from the 1971 Survey of Consumer 
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which is the probability that a person of age a, 
education e, and sex s will marry an individual of the 


opposite sex of age a' and education e'. 


4.2.7 The Family Independence Parameters 


The purpose of the family independence process is 
to declare dependent individuals independent of their 
family, and to cause make to them "leave home" ona 
probabilistic basis. Direct data for this process was 
not available, and it was therefore necessary to make 
inferences from available statistics. The estimation 
of the required probabilities was based on the model 


described below. 


We assume that an individual may be in one of 
three states: (1) dependent, (2) independent or unattached, 
or (3) married, divorced, widowed, or separated. We 
will denote these states by d for dependent; i for 
independent, and m for "not-single". We ignore death 
or emigration, and transitions between the above 3 


states are assumed to occur from year to year. 


We make the following definitions: 


a. the probability that a person who is 
dependent in a given year will become 
independent in the subsequent year. 


8 = the probability that a person who is 
either dependent or independent in a 
given year will become "not-single" in 
the subsequent year. 


D, (a) < the population of d type individuals at 
. age a at year t. 
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the population of i type individual at 
age vd At year et, 


the population of m type individuals of 
age a at year t. 


The shown diagram illustrates the transition 
from one state to another. 


Since the system is closed we can easily derive the 


following identities: 


Tied (atl) = T, (a) * a D(a) - BI, (a) (a8) 
Dial (atl) = D, (a) = © D, (a) = BD, (a) Z) 
Mey (atl) = M, (a) + 6 D, (a) + 8I, (a) (3) 


By addition of terms in the above 3 equations we obtain: 


(atl) +M (a+l) = T, (a) + D, (a) + M, (a) (4) 


e 


which is as we would expect, since the system is closed 


and the population remains constant. 


We further assume that the following relations hold: 


I, (a+1) 
T, 4, (atl) - on ee Ae 
{ im D + MM, (at 
T, 7 (art) + Dy 47 (att) + M, (atl) T, (a+1) + 4 (ar ) 5 
Diiveaeh) 
Dy, (ard) ; 2 a : a es 
= tT (elie oe iti (rel 
I, , (atl) + Dy 47 (art) + M4 (art) I, (at Z é 
If we add these two equations and subtract them from 
unity we obtain: 
M, (a+1l) 
M., (art) a _— = Sarat (7) 
Tn atl) + ; jee ee + a L (at. 
Thy (ard) + Dy 4, (art) + My (ard) , (at js t 
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Relation (5), (6) and (7) assume certain stationary 
properties of the distribution of our population. For 
example, by (5) we can ae that if the population of 20 
year old independent individuals in 1968 was 7% of the 
whole population of 20 year old individuals ens 73 is 
valid for the 20 year old population in 1967 or 1966 
etc. The same interpretation can be given for relations 


(6) and (7) as well. 


By substitution of (4) and (1) into (5) we obtain, 


(1=8) T, (a) ord D, (a) fallen) 
T, (a) + D,(a) + Mi, (a) = T,(a*l) + D,(avl) + M, (ard) (8) 


Samatarly, substitution of (4) and (2) into) (6) yields, 


(1-a-6) D, (a) D, (atl) 

T, (a) + D, (a) + M, (a) = I, (atl) + D, (a+) + M, (a+) (9) 
Equation (8) can take the form, 

Be a. D, (a) ; I, (atl) ee + D, (a) * M, (a) ae 

T, (a) Aerie) T, (atl) + D, (atl) * M, (a+1) 
ana equation (9) can take the form, 
D 1 iy + D. (a) + M, (a) 
Bere ee! peaches : (11) 


D, (a) I, (a+) + Dy (a*1) + M, (a+) 
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By subtraction of (10) and (11) and solving with respect 


to a we obtain, 


(a) T, (a) +D, (a) +M, (a) I, (a+l) D, (a+1) 
ee en a ee ae ee ote eee (22) 
T, (a) +D, (a) I. (atl)+D, (atl) +m, (atl) Te) D, (a) 


Equation (12) is a formula providing an estimate 
of the probability that an individual will become 
independent. All of the terms on the right hand side 
can be derived from the Survey of Consumer Finance. 
The manipulation of the actual figures is presented in 
Appendix C, together with the other probability parameters. 
The model failed to give meaningful results for ages 
over 26 because the relevant populations became statistically 
insignificant. It was therefore necessary to assume a 
constant probability of independence for dependent 


individuals over age 26. 


It should be noted that the "populations" referred 
EO above pertain to a particular sex. The derived 


probabilities are therefore contingent on age and sex. 


4.2.8 Interprovincial Migration Process Parameters 


The model of the internal migration process requires 
three sets of parameters, all of which were derived 
from a 5% longitudinal sample of tax filers collected 
by the Department of National Revenue for the years 


POG S70. = 


* Department of National Revenue (Taxation), unpublished data. 
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The first set of parameters is the array Ty 
(R,I,A), which is the probability that an individual in 
Region R, income class I, and age Class A* will stay an 
his present region of residence. The second set is the 
array q, (N,O), which is the conditional Dronauatie 
that a person who is leaving “old region" 0 will move 
Eo new region” N. Note that since this array is 
conditional on the person moving somewhere else, 

q, (4x) = 0: for all x = 1,2,3,4,5. The final set of 
parameters is required because if a person** is found 

to move to either the Prairies or to the Maritimes, the 
actual province to which he is moving must be determined. 
Thus q, (OP ,NP) is the conditional probability that.a 
person whose new region is either the Maritimes or the 
Peatries, and whose Old province is OP (OP = sly 27a) 
Wielomove tO the province NP (NP = 1, 2, 37 45-7, cymele 
We note that this set of probabilities allows for 
Migcacton within the Prairie provinces, and witainy tue 
Atlantic provinces. For example, an individual who 
originally lived in Manitoba might be found through 
array q, to not leave the prairie region. But array q3 
might then establish that although he did not leave the 


prairie region, he did in fact move from Manitoba to 


Alberta. 
Validation of the Demographic Block 


No simulation ever yields perfect results. The 
purpose of this section is to describe, in a general 


way, the reasons why deviations can arise between 


The regions are Atlantic, P.Q., Ontario, Prairies, B.C. The 
income classes are 0-$1,499; $1,500-$2,999; $3,000-$4,499; 


$4,500-$6,999 and $7,500 and over. 


"individual" we are really 


Note then when we say "person" or 
ea d to move his whole 


referring to a family head who is assume 
family with him. 
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simulated values and actual measured values. We then 
go on to analyze these errors in the context of the 
Demographic Block and to discuss the ways in which they 


can be eliminated or reduced. 


4.3.1 Addivitity of Errors Principle 


Consider the following: 


ice Het the “population at risk be of "size Np angele: 
p be the probability that a certain event will 
occur to any given individual in this population. 
For example, we might be considering a population 
of 100 males in Ontario, and the probability that 
any of these individuals will survive for the 


period of a year might be p = .9986. 


Des If we now simulate the given event for the given 
population, we will achieve x "successes". In the 
above example, we might find that 90 of the original 
100 males in Ontario survive. The expected number 


of successes is 


E{x} = Np 


We can write the actual number of successes as 


x= Npet G2 


where e is the “absolute simulation error", 
Ss a a ——————— 
which is expected to be zero, but will in fact be 


some positive or negative number. 
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oF Assume now that neither N or p are known precisely. 
Thatous;, we know N" instead “of “the “true nN, end joe 


instead of the true p. 


Let AN 


iT} 

2 
: 

Za 


end, Ap =p t=p 


The error AN will be called "absolute initial population 
error", and the error Ap will be called "absolute 


parameter error". 


A. Simulatingua population of size NY with preobabiirty p- 


will yield &® successes, where 


»*) 


a am) 
= CDs et es 


(N+AN) (p+Ap) + e, 


Np + NAp + ANp + ANAp + SG 


From which, 


If we ignore the second order term ANAp, we can write 
Np 


Gh) eae 2 ae 


Pits Ghescale=s wn p S 


which expresses the "additivity Of errors principia™. 
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We should note that er is the total error 
relative to the true expected number of successes. The 
errors €, and ED are expressed relative to the true 
population size and the true probability respectively. 
PG we inas Ly , E is the simulation error relative to the 


expected number of successes. 


We will now discuss each of these errors in CUER, 
as they apply to the processes of the Demographic 


BLOCK. 
Wao. 2 SIM Lataon EXrrors 


Again consider a population of size N' subject to 
some event with probability p'. Let x be the number of 
"successes" for the group in question. As in the 
analysis of section 4.3.1, we know that the actual 


number of successes will be, 
x = xte_ = N'p'te 
s Ss 


where Ge is the absolute simulation error. The simulation 
error will have expected value zero (if the random 
number generator is Perlect) , and its variance will be 


Meo gH. ee Ehak 1S, 


Etere? = Var {x} = E{ (x-x)?} = N'p'(1-p') 


What we wish to do now is determine the expected bounds 
of this error. That is, we wish to set certain natural 


limits on the capability of micro-simulation. 
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We can establish confidence intervals for a.” by 
using Tshebysheff's Lemma.* This proposition states that 
"for any non-negative random variable u with known expectancy 
u, the event {u<t*u} for any t>1 has probability greater 
than 1-1/t*". Applying this lemma, we can set/u = 2 ae 
u= Nop atiop) @andet: =62'0 eThis "gives ; 

Prob {e_* < 4N'p'(1-p')} > 0.75 
Or; 


Prob te SZ Np ee) aS 


Generally we are interested in the relative simulation 
error with respect to the expected number of successes. 


We therefore obtain, 


“s |< 2 \jy isp' 


Nope 


“Prob; 4 } > 0.75 


We can state this in words. At a confidence level higher 

than 752°the’ relative Simulation error “Ss withy respect £0 
a 

the expected number of "Successes" cannot exceed in 


magnitude the quantity Ni p.e) ip te 


Table 4.1 gives the expectancy x &'N'p! for a binomial 
distribution. Table 4:2 contains the maximum relative 
eimulation error (at higher than 75% confidence eve l)itor 
selected values.of N'’ and.p’. By inspection o& Table 4e2 
we can see that for any given population size, the maximum 
relative simulation error is inversely and non-proportionately 
related to the size of the probability of the event. Similarly, 
for any given value for the probability of an event, the 
maximum relative simulation error is inversely and non- 
proportionately related to the size of the population at 


risk. Two conclusions can be drawn from Rabie. 4.2. 


* Uspensky, Introduction to Mathematical Probability pp. 182-187. 
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TABLE 4,1 
XPECTED VALUES FOR BINOMIAL DISTRAS 
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First, small population sizes at risk (N') combined 


with small probabilities of success (p') will yield 
very large relative simulation errors. 
this with respect to survival simulations. 


probability of death for males in the 5-9 age bracket 


DoD =. 300063. 
expected number of deaths, simulated deaths, and the 
Calculated simulation errors. 


The last row gives the 


maximum relative simulation error, 


TaD Leas 


Death Simulation: Males Aged 5-9 Years 


Region Maritimes Pee Ontario Prairies Bace 
Pop. at risk 230 7,808 S195 3,849 27207 
Expected Value a lepe ye) 4.91 De 2.42 ABE EES, 
of deaths 
Simulated 2 6 5 5 0 
deaths 
Relative =25.8% a: =1525 -106.6% -100.03 
Simulation 
error 
Max. relative 159% 90% 89% 129% 1703 
simulation 
error (3%) 
From the above table it is clear that although our 


simulation results are acceptable, 
margin of simulation error that can arise when we are 
working with such unlikely events. In the case of 


Ontario, for example, the simulation was excellent. 
But it can be seen that this is purely the result of 
chance, in the strictest meaning of the word. 


just as easily have had a very large error. But De 


should be noted that relative error has little meaning 


The average 


there is a very wide 


We could 


We can illustrate 


The following Table 4.3 gives populations, 


Canada 
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when we are considering very unlikely events. In these 
cases absolute error is the more relevant concept, and 
it can be seen that by this criterion the simulation 


performs very well. 


It should be noted also that one of the reasons 
the relative simulation error seems very high is that 
it is defined with respect to the expected number of 
successes. Since the latter can sometimes be very 
small, the relative error can sometimes be very large. 
Alternatively, one could define "success" to be survival, 
rather than death. This will give us an indication of 
the extent to which the simulation error affects the 
whole population. This of course reverses the situation 
we had previously. The simulation survival error for 
B.C. (which is the province with the largest death relative 
simulation error) is now 


tes? ay 


Tip tee 100957 ec 


and the maximum simulation error is 
-00063 be 5 
Vein dui: ICL or 


We can thus see that the simulation error will create a 
deviation from the whole population by one thousandth 


at the most. 
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The second conclusion which we may draw (really 
the converse of the first) is that large population 
SiZes at) cask) (NY) combined with large probabilities of 
success (p') will yield small relative simulation 


errors. We will consider two examples. 


(a) The aggregate death probability a8 p = .0077, wale 
the total population in our 2% sample is approximately 
400,000. The expected number of deaths is x = 2,800 
while the maximum relative simulation error could 


go as high as 3.76% (at a confidence level higher 


Ean: 77/5:3)) & 


(oye Thewaggregate; fertility probability Wemp 9107, 
while the female population between the ages of 15 
and. 49° is /N =r 213,000. The! expected, aggregace 
number of births is x = 14,910, and the maximum 
relative simulation error could go as high as .78% 


(at a confidence level higher than 75%). 


We can conclude, then, that at the national aggregate 
levels the birth and death processes will be almost 


entirely free of simulation errors. 


The above analysis of simulation errors was based 
on error free populations at risk and probabilities. In 
practice, neither populations at risk nor probabilities are 
error-free. Therefore, exact evaluation of the simulation 
error is not possible. However, the maximum relative 
simulation error can still be evaluated, using the above 


suggested expression 2 VY (1l-p)/(Np), at the confidence 
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level of 75%. Values of this expression are contained 
in Table 4.2. In appendix C we do not calculate these errors 


but they can be either calculated or obtained directly 


Prom Table 4.2. 


Tos) Ln tia lL Population Errors 


According to the Additivity of Errors Principle 
(ef. section 4.3.1) any imperfection, of the, initial 
year tape will generate errors for the simulated population 
of the following year. No action is taken to rectify 
this, but an assessment of these so-called initial 
population errors is always necessary. This necessity 
is obvious because these errors put a lower bound on 
the total errors. For example, if there is a 3% over- 
estimate in the number of females of age 20-24 in the 
initial year file, then the simulated births from these 
women will, under perfect conditions (i.e., correct 
birth probablilities and negligible simulation error), 
be 3% too high. Of course, the total error might be 
Higner, ©€.¢., 75, in which case the other two error 
components (simulation and parameter) had an additive 
effect. .Or the total errors might be lower, e.g., 1% 
or, -2%, in which case the other two errors had cancelling 


effects. 


The initial population errors have cumulative effects 
when the simulation is done over a period of more than 
one year. The present report does not attempt to assess 


the extent of this cumulative error. Instead, Appendix C.3 
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Simply documents the size of these errors, as they 

pertain to certain of the Demographic Block processes. 

For example, from table C.31 we have for Quebec a 123 
underestimate of the 65-69 years old females, while the 
same age bracket males in Ontario are underestimated by 
18.5%. Also the males in Ontario aged 25-29 are overestimated 
by 9.2%; this is one of the few overestimates we have 

on the initial population as of April 1, 1968. These 

are some of the worst initial population errors. Most 

ado not exceed +5% as one can verify from table C.31; 
Finally, one could observe from table C.32 that the 

base year file underestimates the number of married 

women 15-19 years of age by 24.2%. This indeed is a 
serious shortcoming because this group has high fertility 
probabilities and this particular underestimate contributes 
to initially large errors on the number of births. As 

the simulation proceeds, of course, the birth error is 
progressively corrected as more accurately estimated 
initial population female coharts move into their high 


fertility years. 
4.3.4 Validation and Parameters' Calibration 


The various probability parameters used in the 
Demographic Block are provided from various sources. 
Some of them were estimated from raw data by taking 
ratios of successes over populations at risk. Others | 
were supplied ready for use from other studies. Finally 
some were estimated by inference from time series or 
from simplified models. It is evident that before we 
use the probabilities of the Demographic Block, they 


should be first validated and if necessary adjusted. 
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In principle the Validation problem can be stated 


as follows: 


Let p be the probability that a certain event will 
take place to an individual with certain characteristics. 
Also let N be the observed population of such individuals 
and A the observed number of successes of the event in 
question. Under the hypothesis that p is the correct 
probability the confidence interval for the successes 


at a confidence level of 90% is the interval: 
I = {Np-1.65VNp(1-p), Np+1.65VNp(1-p) } 


If the observed successes A are within this interval 
then there is no evidence against the correctness of 
the probability p. If, on) the other hand, this) as nee 
the case then we assume that p is incorrect and that it 


should be adjusted and replaced by: 


new 


where, the correction factor C = A 
Np 


The above procedure «an be applied on aggregate 
statistics rather than the stratified ones which in 
general are not available. For example, the birth 
probabilities are provided by Stone in single years of 
age of the potential mother, her birth parity, and her 
marital status. For the year 1968, however, we had 


reliable statistics on population at risk and observed 
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erties by Marital status) ee, legitimate and illegitimate 
cases, and 5 years age brackets. The birth Darl ty 
stratification was not available. For this reason 

instead of the original probabilities we obtained 

average probabilities stratified by age and marital 

status only. The average of these probabilities is 
theoretically justified by Poisson's Theorem on weighted 
averages (see J.V. Uspensky - Introduction to Mathematical 
probabrilities ="pp. 208-215). On “these average probabilities 
we applied the validation procedure outlined above. If 

a certain average probability needed to be corrected, 

then the corresponding correction factor was applied to 

the whole set of original probabilities whose average 


is the one in question. 


In Appendix C.2 the validation results for the 
emigration, birth, marriage, divorce and survival 
processes are presented. The processes are validated 
and the probability parameters are calibrated in the 
above mentioned way. However, no validation was done 
on the Family Independence Parameters due to lack of 


complete information. 


The Interprovincial Migration Parameters were 
validated in the following simple minded way, due to a 
lack of any information other than the original data 
from which these parameters were estimated. We ignored 
birth, immigration, emigration and death and analytically 
we estimated the distribution of population in April 


1972 from the Consumer Finance distribution in April 1968. 
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This was done by moving families from province to 
province on a year to year basis. The process was 


analytical and implied the closed Markovian system 


identity: 


where Xr X47 are the family counts” vector at year e 

and ttl, respectively, and P is the transition matrix 
which depends on income and age of the heads of the 
family units in question. In Appendix C these results 
are presented and compared with the distribution of 
population in April 1972 as recorded on the corresponding 


Consumer Finance Survey tape. 
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Si3 The Activity Status Block 


This chapter is divided into five main sections. 
Section 5.1 gives a brief overview of the block as a whole. 
Section 5.2 describes the logic of the simulation in detail, 
and makes explicit all the various assumptions that are 
made. Section 5.3 discusses the labor force model that is 
the heart of the Activity Block, and details all the mathematical 
adjustments that were required to refine this basic model. 
Section 5.4 is concerned with indicating how well the model 
perlormms, asi compared’ with historical data, ‘“Section’5.5 
gives a complete description of all of the data that is 
used, and the sources from which it was obtained. Appendix 
D provides greater detail, lists the data and documents the 


relevant computer software. 


5.1 General Overview 


arise Sue posenok thevActivity wBlock 


Broadly speaking, the purpose of the Activity Status 
Block is twofold: first, to update those variables in the 
individual state vector that describe what a person is "doing" 
during the year being simulated; and, second, to make certain 
adjustments to the individual state vector preparatory to 
updating the person's income in the Market Income Block. 
Specifically, the Activity Block does the following: (a) 
it determines the number of weeks during the year being 
simulated that the person spends in school, employed, unemployed, 
and in the non-labor force (where "non-labor force" is defined 
so as to exclude those in school); (b) it determines whether 


: : : 1 
a person advances his education; (c) it determines the person's 
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activity at the end of the year being simulated or the 
beginning of the next simulation year (the "year" being simulated 
is defined as April through March); (da) it makes any necessary 
changes in a person's employment category ("Type"); (e) and 
finally, it converts a person's annual wages to a weekly 

wage rate if he was employed as a Class B person in the year 
preceeding (the one being simulated. » (A "Glass B personwas one 
who is subject to unemployment). The Activity Block makes 

pie requisite changes in the individual's state Vector onmene 
basis of his present year's demographic characteristics, his 
past year's activities, and exogenously determined monthly 


Canadian aggregate rates of unemployment. 


5.1.2 Methodology and Data 


The methodological approach taken by the Activity 
Block is somewhat different than that taken by the other blocks 
in POLSIM. In the Market Income Block, for example, the 
various components of an individual's income are simulated 
directive A ltransition matrix determines to what extent, 
if any, a particular component of a person ’s ancome Ts )tobe 
increased or decreased during the simulated year. In the 
Activity Block such direct transitions are not possible. 
Tnstead) la person ius thought«ofras being in one Ofefour activity 
states: employment, unemployment, non-labor force, or school. 
Transitions among these four states then take place month by 
month, tracing out a year long "activity history fonrea 
given individual. The number of weeks in each of the four 
states, any change in education status, and any changes in 


employment category are then inferred frompthis history. 


a ov (ta 7 + A 
a a ~ Vi L 7 
ares Lee Foal i 
- 
er se? 


rN = ay 
7 My ee ee 
nae 2a pony ce 
7 = ht : es ate if > a. 


. mig 
ie mang mane ; p ——o ” 
As eur am erewksingo oft has 
ais ac) sudtaie’ tie anon ai rey 

i pie cat Pavia eashils Aide ager alan@ 0 
Plilgoer, VAlLLetneick plernmwp ee lige ; 
o J aay eres ppnaree we ioe a 


oe meaty: - ; 


agiwi dae 29 riety Meera Limieliebadees Co an 
hworiid: Seedy etl \w (a4 264% Mda OgeaeTti Seegoee as dvela 
or! aieO@iay 90) “Ole sSaidal Seth wid) at reid az 
Batsiiinte ey Wei) 4° Seubadase! re lo coieecqew erolims 
tei ia _ wdlly imlnowia ¢ixtee ugissanes? A ‘Vesaesee 


od OP Ch Weetoel ste, Fe Poetognes | sAlawlsekd 4 ee 2 
ve @) ; @ey ouetelamsé ad) geivod Geaetrsrag® #6 bameerieach 
‘81010604 You Sys O72 ened greaif tags, dee eee 

re ton uh i ie -.@60 OF Gabel) eH aaBwet?. el fodtat iG basses 
feciva @ 6214) Gade l~pet seumegnlsmeee Senniel gts net aan 
yo jo winks toler cei aetede We? engis uiom, Bieta Leet 

# 308 *“ernaned (106 4s" Bits © Ja a auhlp pitied diseume 

ra h,3u duwa' at edesa leo audeme «tf i Senshi nk pavie 

ad) pepraien we dad ,utete paliannwa, prbicnne Wu ee 


cet: 29 pmnz? beriete! er. uwn prog > debnyd qe 


Much of the Activity Block is concerned with the 
peonolem Of calculating the transition matrices that are used 
to determine how a person will move from month to month. 

That 1s, these matrices are for the most part not read in as 
data, but rather are calculated from other data in che model. 
How this is done will be discussed in some detail in Section 


ea below. 
5.1.3 The Activity Variables 


There are seven variables in the individual's state 


vector that the Activity Block is concerned with. 
He Weeks in School 


This is simply the number of weeks during the 
year (the year being defined as April through 
March) that the person spends in one or more of 


the 19 education states. 


Pa Weeks Employed 


The number of weeks during the year that the 


person spends in the employed state. 


oe Weeks Unemployed 


The number of weeks during the year that the 


person spends in the unemployed state. 
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aie Weeks in non-labor force 


The number of weeks during the year that the 
person spends in the (narrowly defined) non-labor 
force, The Activity Block distinguishes two kinds 
of non-labor force status. The generalized non- 
labor force (GLF) includes all people who are 
neither employed nor unemployed. Thus students, 
housewives, and children would fall into this 
category. The narrowly defined non-labor force 
(NLF) is the same as the GLF, except that it does 


moc include students. 


5s Education 


There are 19 possible education states that a 


person Might be in. They are self-explanatory: 


1 Grade 9 a Univers 

2 Grade 10 12 Univ . 4 

S Grade 11 1S) Univ ¥& 

4 Grade 12 14 Univ 6 

5 Grade 13 £5 Unvv 7 

6 CAAT uf 16 Unive 7s 

7 CAAT 2 Ly Univ = 

8 CAAT 3 18 Univ LO 

9 Univ a 19 Less than Grade 9 
LO Univ Z 


(Si April Activity Status 


The April Activity Status defines what a person is 
doing in April of the year following the one being 
simulated. The reason for this variable is that 
for the month-to-month simulation it is necessary 


to know what state the person will be in when he 


Straw smolts. 


ips ae) Sh. ere agen: hat oy 


1 


heh? ee eHran Has s SEITE ition dy auy an 
Tre vr las o ng te phe xi as =e ai 


- ae 2 y 
a oe 
Hint’ (GA 
1 "WEHts 38a 
a wi 
B areal 4 
¥ i te ¢ 
at 
7 ae 


a i og SES kiya! 


+o. 
2 30395) ae ‘ notes sa | arth vO wb Awe eat 


' - Poa} rT i S 12 S itz MIog 
ve om Clie fr “* nen. ; I Le 7 7 of 
> oi Bin’ me 

tai! ad [St can Seki ROU 64> f bot) albie na’ ' a Slugs 
une > ak WA ui mW enines edi 3 
Sif Ai ae de ee ee) et ee ity weve sete word ' og 

‘i 
» uo 424828 
Vi 
‘ 
’ 


a 


April Activity Status thus provides continuity 


from year to year. 


theregare 23 "Activity States, and they again are 


self-explanatory. 


ge Grade 9 3 Univers 

2 Grade 10 14 Univ 6 

3 Grade ll LS Univ 7 

4 Grade 12 16 Univ 8 

5 Grade 13 L7 Univ 9 

6 CAAT i 18 Univ 10 

if CAAT 2 19 Unused 

8 CAAT 5) 20 Employment 

9 Univ 1 24 Unemployment 

10 Univ 2 22 Non Labor Force 
Li Univ 3 23 Age less than 14 
2 Univ 4 


Employment Category (Type) 


Different kinds of people relate to the labor force 
in different ways. The Activity Block distinguishes 
people who will never become unemployed, and people 
who will; people who are retired and people who are 
not; people who are in pensionable employment and 
people who are not; and people who have become 
employed for the first time or retired for the 
first time. The Employment Category is a sort of 
catch-all variable that supplies all of this 


information. 


There are 13 possible values for "TYPE": 


la). 14.— Thais,is.a “Class,A® person (a male who 
is either self-employed, or employed in 
a professional, technical, or managerial 
capacity, and who by assumption may never 
become unemployed) who on retirement will 


receive no private pension. 
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(b) 


Ce) 
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15 - A "Class A" person who on retirement 


24. = 


> 


140, 


will receive a private pension. 


A "Class B" person (by definition not a 
"Class A" person; someone who is subject 
to unemployment) who on retirement will 


receive no private pension. 


A "Class B" person who on retirement 


will receive a private pension. 


A person who has never been a member of 


the labor force. 


A retired person who does not receive a 


private pension. 


A retired person who is eligible for 


private pension. 


150, 240, 250): 40, 5G. 


These are exactly the same as 14, 15, 24, 25, 4 


and 5 except that the person has not yet had an 


initial income assigned to him, or has just had an 


initial income assigned but has not yet gone 


through an income transition. 


5.1.4 General Organization of the Activity Block 


A general picture of the Activity Block can be 


Potained from the Macro Flow Chart given in figure 5.1. Tie 
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ACTIVITY STATUS BLOCK 


START 


READ INPUT 
PARAMETERS 


CALCULATE 
TRANSITION 
PROBABILITIES 


READ AN 
INDIVIDUAL 
RECORD 


ENTER LFCHG 


| THIS IS THE MAIN | 
SUBROUTINE. 


PERSON !S CHANNELED | 
DEPENDING ON EMPLOYMENT 
CATEGORY AND AGE. 


| MONTHLY SIMULATION IS | 
CARRIED OUT IF NECESSARY. 


STATE VARIABLES ARE | 
UPDATED. 


1 ain Sar ea decayed in 


EXIT LFCHG 


NO LAST 
RECORD? 


FIGURE 5.1 — MACRO FLOW CHART OF THE 
ACTIVITY STATUS BLOCK 
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program begins by reading in all of the input parameters. 
These parameters include regression coefficients and unemploy- 
ment rates from which an initial set of transition matrices 
are calculated, parameters that are used to adjust these 
probabilities in various ways, and probabilities used to 
determine whether and how a person advances through school. 
Once ell of the data is properly constructed pandividuals 

ere pessed through the logic of the simulation model proper. 
In this section persons are directed through various processes, 
depending on how they relate to the labor force (their 
Employment Category), and a few other factors. Children, 

for example, bypass most of the Activity processes completely. 
And persons in employment categories 14 or 15 bypass the 
monthly labor force simulation. The relevant state variables 
are then updated, and the whole procedure is repeated for 


Ene next individual. 


5.2 Detailed Organization and Assumptions of the Activity Block 
The enalysis of an individual sractryities ean be 

Peaeced With reference to. the-ftlow chart in figure; 5.2. it 

@efines the way the activity variables of an individual'"s 

state vector are updated. The idea behind the whole process 

is to infer what an individual does during a year by examining 

what he does in every month of that year. A year is gets! 

to run from April through April. (Thus providing a Starting 

point for the next year's simulation.) An individual thus 

starts out in April doing something (being in a particular 


education state, being employed, etc.) and this then influences 
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FIGURE 5.2 — THE ACTIVITY BLOCK 
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(together with the exogenously inputted influence of aggregate 
unemployment) what he will be doing in May, and so on. The 
whole year is traced.out in this way. An individual moves 
from state to state, depending on his present state, his 

age, the month being considered, and other factors. Once 

eae entireryear has been completed, the year) atsel£ can. be 


summarized in the relevant state variables. 


In some cases it is unnecessary to analyze every 
month of the year. Depending on the kind of person in 
question, it is sometimes possible to determine directly 
what his year's activities will be, without examining his 
month by month behavior. Certain aspects of a person's 
state vector will determine to what extent the detailed 


analysis can be bypassed. 


The first of these determinants is age. Ifa 
person is younger than 14 it is simply assumed that he is a 
student in primary school. His education status remains 
unchanged at "less than grade 9", and it is assumed that he 
spends 40 weeks in school and the remaining 12 in the non- 
labour force. By definition he has no “April Activity 


Status" since he is too young to be considered a participant. 


When a person reaches age 14 he is treated ina 
more complex manner. It is assumed in this case that he 
enters grade 9 in the first month of the simulated year 
(April). He then goes through the month by month simulation 
to be discussed below. It might be objected that the person 


should start grade 9 in September, and in fact this is what 
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the month by month simulation will finally show. He is 
initially placed there in April, however, simply for reasons 


of modelling convenience. This question will be discussed 


further below. 


If a person is older than 14 then the way in which 
his year is analyzed depends on the employment category he 
is in. As mentioned earlier, the employment category is a 
state variable designed to keep track of a person's relation 
mo the labor force. It indicates, for example, whether a 
person has retired, when he retired, whether he is in pensionable 
employment, and whether he will be subject to unemployment. 
It is thus necessary to examine this variable before beginning 


the monthly labor force simulation. 


If a person has retired in the year previous to 
the one being simulated, it is necessary to change his 
"TYPE" so as to indicate that he is now in his second year 
of retirement. Thus type's 40 and 50 are changed to 4 and 
5. This change is necessitated by the fact that in his 
first year of retirement a person must be assigned an initial 
private pension. In the second and subsequent years of his 
retirement this person's pension is assumed to be unchanged. 
Tt is therefore necessary to distinguish the first year of 


retirement from subsequent years of retirement. 


If a person is retired then the relevant state 
variables can be assigned directly. It is simply assumed 
that he spends 52 weeks in the non-labor force and that he 
will remain in the non-labor force in all subsequent years 


as well. 
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Tie ea persons. “TYRE” is given as 140. 150). 220y 
or 250 then the year being simulated is his second year as 
PL abOL proce pariicipant, and it is necessary to divide 
these by 10 for the remaining years of the simulation. 
ge sreason or thas is Similar to that necessitating the 
dafference between the first and second year of retirement. 
Mie wtactathat his "Type" is a factor of 10 too high indicates 
coce, last year was his first year as a full time labor force 
participant. At that time it was necessary to know this 
Pact , because a first year participant had,to be assigned 
an initial income in the Market Income Block. Henceforward 


the distinction is no longer necessary. 


It is next necessary to determine whether or not a 
person will retire in the year being simulated. We distinguish 
two kinds of individuals. The first, whose "TYPE" ends ina 
five, is a person who is eligible for a private pension on 
retirement. This kind of person is eligible for retirement 
between the ages of 60 and 65 inclusive. Whether or not he 
will retire in the year being simulated is determined by 
sampling from a probability distribution. The second kind 
Preindividualsis.one who isnot, eligible for a, private 
pension. All of these persons are assumed to retire at age 
65. For those persons who have never been in the labor force, 
nngware, nences /TYPE. 3,'s",4 it. 1s. of. course unnecessary Tos tect 
for retirement. If their age is 65 however, they are designated 
as "TPYE 4", since it is no longer necessary to test month 


by month whether they will enter the labor force. 
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The final means by which a person may bypass the 
monthly simulation depends on whether he is a Class A individual 
(TYPE is 14 or 15) or a Class B individual (TYPE is 24 or 
25). A Class A individual is assumed always to be employed. 

He cannot leave the unemployment category until he is retired. 
He is thus assumed to spend 52 weeks in the employment 
state, his April activity remains "employed", and his education 


status remains unchanged. 


All of the above cases are those in which the person 
does not pass through the month to month simulation for one 
reason or another. The majority of people, however, are in 
employment categories 24, 25, or 3 (full time labor force 
participants subject to unemployment, and non-full time labor 
force participants 14 years and over), and their behavior 
must be analyzed on a monthly basis. The purpose of the 
monthly simulation is to determine the person's activity in 
every month of the year being simulated as well as April of 
the subsequent year, and to update his education status if 
he enters school in September. It is first necessary, 
however, to make a change in the wage variable of these 
persons. Since the person may not be fully employed, it is 
necessary, in determining his annual wage, to know both the 
number of weeks in which he is employed and his weekly wage 
rate. The Activity Block determines the former, and the 
Market Income Block determines the latter. Since the Market 
Income Block proceeds on the basis of weekly wage rate 
transitions, it is important that an individual's last 
year's weekly wage not be lost. It is thus calculated in 


the Activity Block (by dividing WAGES by WKEMP) and stored in 
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the WAGES location until it can be updated in the income 
block. The Activity Block now calculates a new WKEMP which 
is then used with the new weekly wage rate calculated in the 


Income Block to calculate a new annual wage. 


The month to month simulation now proceeds as follows. 
If the person is 14 then he is assumed to be in grade 9 for 
every month of the year with the exception of the summer months. 
The effect of this will be to show the person as spending 
April through June and September through March in "school", and 
will leave him in grade 9 in April of the subsequent year. He 
will then be a year older and will in all likelihood graduate 
to grade 10 the following September. Strictly speaking, 
the simulation treats him as being in grade 9 during three 
months in which he is actually in grade 8 (April through June 
of the year in which he is 14), but this is irrelevant insofar 
as the "yearly" state variables are concerned. The "weeks in 


Benoolt”’ etc. “are still tabulated correctly. 


For all persons older than 14 the simulation considers 
transitions among three states: "employment", "unemployment", 
and "generalized non-labor force (GLF)". The person is in 
one of these three states during the input month, and he moves 
to one of these three states for the output month, according 
to a transition probability which depends on his age, sex, 
marital status, and region. This kind of transition will occur 
to most people and is made possible by defining all school 
states to be within the GLF. A person simply passes from 


one month to the next until the whole year is completed. 
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For people who are in school, however, certain 
exceptions and assumptions relating to the above procedure 
must be noted. First, if a person is in school it is assumed 
he cannot drop out. Thus from September through March, 
transitions from GLF (which in this case is equivalent to 
some school state) to employment or unemployment are not 
allowed. But for input months April through August normal 
transitions take place, thus allowing students to move into 


the labor force during the summer months. 


For transitions in the two months May and June, 
it is necessary to distinguish between NLF and school for 
those who enter the GLF state in these months. It will be 
recalled that GLF includes both of the former states. The 
required distinction is made as follows. It is assumed that 
GLF means NLF except for those persons who were in school in 
the input state. For these latter individuals GLF in the 
output month is assumed to mean the same school state that 
the person was in during the input month. That is, a trans- 
ition from GLF to GLF for students is recognized as being a 
Sransitilon, say, from Grade 11 to Grade 11. For those who 


Sree not students it 1s a transition from NLF to NLF. 


We now come to the question of placing a person 
marecnoo). it is first assumed that the only people for 
whom it is possible to start a new school year are those 
who pass into the GLF state in September. The problem then 
reduces to distinguishing between those who enter some | 


school state and those who enter NLF. This is done on the 
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basis of a probability that is conditional on the person's 

age, sex, marital status and reaper ACEIVIity Status.) 0" ac 
turns out that a person does enter a school state, a transition 
matrix determines what his new school state will be, and his 
education status is updated. The transition matrix is 


gescribed below in 5.5.1. 


Once the monthly simulation has been completed it 
is a simple matter to total up the number of weeks spent in 
each of the four states: employment, unemployment, non- 
P2eor LOrce, sand ‘school. It remains then to assign “TYPE” 
to those who become new full-time labor force participants 
in the year just simulated. A person is assumed to have 
joined the Tabor force if his present "TYPE" is 3, and if in 
the present year he spent zero weeks in school and some 
weeks either employed or unemployed. Type is assigned on 
the basis of education, province, and sex. Class A persons 
(whose TYPE will be either 140 or 150) are distinguished by 
education. These are the people who will never be unemployed 
aporCilass A status is assumed to apply to all male university 
@racuates. Whether or not a person is a participant ina 
private pension plan (thus distinguishing between the "40's" 
ang the “"50's") is determined on the basis of a sex-province 


Gasteri bution. 
5.3 The Labor Force Model 
5.3.1 The Basic Model 


The Activity Block centers around a Markov-chain 


model of the labor force which is heavily dependent on a 
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similar labour force model developed by Donald Dawson and 


Frank Denton at McMaster University*. The idea behind the 


Denton-Dawson model is to abstract from the complex of 


factors that describe both sides of the market (the/jlabor - 


leisure choice, wage rates, job vacancies, aggregate demand, 


etc.) by assuming that all of these factors are adequately 


represented by the unemployment rate. Thus both supply and 


demand in the labor market are considered only implicitly. 


mane problem that the model sets, out to solve is that of 


Pescrabing, month by month, the labor force activitieswor 


certain kinds of persons, given only an unemployment rate that 


applies to all persons. 


To be more specific, assume that at any given time 


a person must be in one of three mutually exclusive states. 


He must be either employed (E), unemployed (U), or in the 


non-labor force (N). Assume further that a person can only 


move from one of these states to another at the beginning 


of a month, and that he must then "reside" in that state 


for the whole month. Assume finally that we wish to know what 


a person does through some time period, say a year, and that 


we know in which of the 3 states he is in at the beginning 


of the period. Then if we knew the probability of moving from 


one state to another in each of the given months, we could 


trace out a month by month history of the individual” for tie 


year. Calculating these probabilities is the purpose of the 


Denton-Dawson Model. 


* 


F.T. Denton and D.A. Dawson, "The OTA Simulation System", 
Report prepared for the Department of Manpower and Immigration, 
June 1971. 
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POL any given pair of months, 9 probabilities need 
to be calculated, and these can be designated by a transition 


matrix: 


State in month t#l 


E U N 
state E Pil Pio P13 
at Uaen5 7 Poo P53 
month t N P3] P35 P33 


Thus mie is the probability of moving from state 7 in mone 
EeeOoState J in month ttl. This probability is assumed by 
Denton-Dawson to depend on two sets of factors. The first 
set is the demographic characteristics of the individual, 
specifically his age and sex. Thus if the population is 
broken down into 9 age groups and 2 sexes, 18 of the above 
matrices would have to be calculated for each pair of months 
being considered. Age and sex we denote as stratification 


variables. 


The second set of factors can be called functional 
variables. Given a certain stratification of the population, 
what will the seeqie ca probabilities depend on? Denton-Dawson 
assume they will be functions of three things: the average 
level of unemployment in the two months being considered, the 
change in unemployment in the two months, and a seasonal 


factor indicative of the months themselves. 


For each age-sex group then, the following equation 


can be specified: 
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where hse: is the transition probability between state i in 
month t and state j in month t+l. M,. CK Mie D2) eee ee eared 
dummy variable that has value 1 if the calendar month is K 

fete defined to be 1 for January, 2 for February, etc.) and 


is 0 otherwise. And UL is the unadjusted Canadian unemployment 


rate in month t.* 


The equation includes among its explanatory variables 
the mean and difference of unemployment rates in two con- 
secutive months. This is precisely equivalent to using the 
two unemployment rates themselves (UL and Ui 43) rather than 
linear combinations of them. The given specification is 
more useful, however, from the point of view of interpreting 
labor force behavior. And since the mean and difference of 


unemployment rates are uncorrelated, the problem of multi- 


collinearity that would otherwise arise is avoided. 


Denton-Dawson estimated these equations by making 
use of data** for the 96-month period starting with December 
1961 - January 1962, and ending with November-December 1969. 
The nine age groups he used were 14, 15-16, 17-19, 20-24, 
25-34, 35-44, 45-54, 55-64, and 65-69. With a separate time 
series for each age group, each sex, and each of the 9 
probabilities, he had 162 equations to estimate in all. The 
equations thus estimated were found to provide a good statistical 
2 
model of the labor force (in terms of R', t-tests, F-tests, 
etc.). Denton-Dawson also found that test simulations for 
* It will be noted that the equation includes only eleven seasonal 
dummy variables. This is because it is not possible to estimate 
a regression equation that has both a constant term as well as 
a set of dummy terms that sum exactly to Unity. “ex The least 
squares normal equations must not be dependent. The decision 


as to which month to drop is quite arbitrary. Exactly the same 
values for the dependent variable will be generated regardless 


of which dummy variable is dropped. 


** These data were taken from the Labor Force Survey. 
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periods of up to 10 years, in conjunction with actual historical 
unemployment rates of the 1960's, reproduced quite closely 
the actual patterns of movement of the Canadian E, U, and NLF 


series, as recorded by the monthly DBS Labor Force Survey. 


in dealing with transition probabilities, two 
difficulties are frequently encountered. The first is that 
pence they are probabilities, they must lie in the unut anterval. 
Mhiet 1S, OsP, 561 for all i andj. Secondly, since the three 
Seates are both mutually exclusive and exhaustive, it is 
necessary that each row of the transition matrix sum to one. 


3 
ices, ee Pea = L for all i. This latter constraine as 


ae a) 

et Ae, the original time series data, and it is also 
satisfied by the generated probabilities because of a convenient 
property of least squares estimation: if a set of dependent 
variables is subject to a linear equality constraint and if 
these variables are all regressed on an identical set of 
independent variables, any estimates of the dependent variables 
derived from the regression equations will also be subject 

me che linear constraint. But the unit interval constraint 
need not necessarily be satisfied. It is possible to generate 
Megative probabilities, and probabilities that are greater 

than one. Any cases of this happening are handled by the 
computer program, which sets the aberrant probability to 


either zero or one and then normalizes the relevant row. 


Estimation of the required transition matrices requires 
exogenous input of seasonally unadjusted unemployment rates. 


This is quite inconvenient, since one is far more likely to 
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have a "feel" for seasonally adjusted rates. Or one would 
like to input a "low" set of rates and then compare the 
results of these with the results generated by "medium" or 


"high" rates. Again, seasonally adjusted rates would be far 


more COnvenient for this purpose. 


Denton-Dawson solve this difficulty by generating 
a set of equations that will translate an adjusted rate into 
an unadjusted rate. The equation is, 


2 3 


1K 


where UL is the unadjusted unemployment rate, ary is the 
seasonally adjusted rate, and t represents time. There are 
12 such equations, one for each month K. The cubic trend 
polynomial is included to allow for autonomous shifts in 
seasonal patterns over time. These equations were estimated 
from time series data in the 1959-69 period, and were found 


to perform very well. It is thus possible to specify seasonally 


adjusted unemployment rates as input to the model. 


Although the Denton-Dawson model just described was 
found to work well in simulating age-sex distributions of 
employment experience over the historical period in which the 
equations were estimated, it is.in some respects not entirely 
suitable for the purposes of POLSIM. Three difficulties arise. 
The first is that the model makes no allowance for structural 
changes in the labor market over time. If, for example, the 
efforts of government programs toward correcting seasonal 
jumps in unemployment figures have been increasingly successful 


over the last 10 years, then Denton-Dawson's equations will 


‘ divs 7" vie m m 
fo 


- 


wis ar *, 5 @202 aad cam sere 

ete Sailr .omld (ih eal] poh v orm , pat angie 
Pease wae, 2i'- »@ ‘Tie fick b4 aac oe 

ab atv > searaee TANTRA an 1 oo weed ai satmaceg oa 

eget g int pe ice, wii sede ao . wk? 7e"D antesded ienougoe 
nie’ .orer ta. ,bolied\ DieGall o@? al steh sakshé ents ings 

wi Dw ce ti ni wt Of Al teeny: iwaa m 23 ..fiow yaee anotag oo 
dhe wee ar aeqn! ee 0% dee eee neseuthe 


948 Barlinneud seer lone hicmaiadesant on dowoaty ii 7 
ta 464 Oral rd 46h Sye-wss Oa lsa tins: ai Tie ace er baer 

vit) (518v a) im) og Ue teers ati ei 2609 cone ine Tromyekene 
plasians it Tales! yates at ef 31 ,feienizan suey enOigeupe 
eit. bikin dedi? oo aeid? -MiAI64 to sleogedq oft x08 eldaghew 
dequdcnaje agi capjemifa. vu etiam leben dy pedi at Jend2 eae 
rt? hquiee (G5 ,J1 . mds Tere Jonze ‘ wogind iy i) Geperrie 

pINGT 

-imitguers yr ign1en frtwor tie Seat ck na 1ovoR oO s27ORee 
eee ({/palaugsons fatipei evi sathatt. oi mnyo lope is seaier 


[div brilssigo @* motvet-derhil weds ,eriey OL Saeliges ove 


11l4 - 


result in an upward bias in the winter unemployment totals 
when we come to simulate the future. Second, to allow for a 
more realistic simulation of wage income, POLSIM distinguishes 
between two kinds of persons. That is, Class A persons who 
are assumed to never become unemployed, and whose wage 

income transitions alone thus determine what their annual 
income will be; and Class B persons who are subject to 
unemployment, and hence whose weeks of employment and weekly 
wage rate transitions combined will determine their annual 
wage income. The problem this creates in the context of the 
Denton-Dawson model is that their probabilities apply to all 
persons in the labor force. POLSIM assumes that the probabilities 
will only apply to a subset of the population (the Class B 
persons), and it is therefore necessary to adjust the generated 
peOovabilities to account for this. The third difficulty is 
that the Denton-Dawson model only stratifies the probabilities 
on age and sex. This is a serious limitation since a person's 
employment experience depends on other variables as well, 

Most notably region and marital status. If one 1s interested 
in the most accurate micro-simulation possible, it becomes 
necessary to disaggregate the probabilities on these variables 


as well. 


In the following sections we discuss the methods 
which were adopted to overcome each of the three difficulties 
mentioned above: the problem of adjusting the model to 
account for structural shifts in the labor market, the 
problem of modifying the Denton-Dawson model to account fom 
the fact that POLSIM distinguishes different classes of 
persons, and the problem of disaggregating the Denton-Dawson 


model. 
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S.3%2 Model Adjustment to Account for Labor Market 
Sir uctucal Shift i 


The problem here can be stated quite simply. One 
generates simulation equations based on data from some 
historical period. When some part of this period thee 
simulated, or when some future period is simulated, the 
Simulated figures generated do not match exactly the corresponding 
actval figures. There are of course many reasons for this. 

But one very important reason may be that the world has in 
some way "changed" since the period in which the parameters 
were estimated, and therefore the historically derived 
Paocomecteks are not altogether suitable. It 1s reasonable te 
believe that this is the case with the labor market. Government 
stabilization programs, changing educational levels of the 
labor force, increased participation rates on the part of 
women, increased bureaucratization, increased job security 
engendered by unions, and the increased attractiveness of 
unemployment made possible by changes in such programs as 
unemployment insurance, all lead to a different underlying 
Brructcure of the labor market. The problem is to account 


Zor this in the model. 


There are two basic approaches that can be made to 
thas adjustment problem. The first is to adjust the regressien 
eoerficients so that they will accurately generate the 
historical probabilities that correspond to a period as 
close to the one being simulated as possible. The second is 
to adjust the coefficients so that they generate probabilities 
that in turn generate employment-unemployment~non labor 
force distributions that correspond fo thoes observed in the 
most recent period. The former method was the one adopted, 


because it was the most convenient and straightforward. The 
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latter method is conceptually much more GLE Cue, paourd 
provide a better result. A preliminary analysis of how one 
might approach the second method is included in Appendix D. 
It could be incorporated in future versions of the model. 

The method employed in the current version of POLSIM is given 


below. 


The data Denton-Dawson used to estimate their 
probabilities ended with the months November-December 1969. 
Data does exist, however, for the full year from December 
'70-January '71 to November-December of 1971. An easy check 
On the regression equations, then, is to generate simulated 
probabilities for the year 1971 and compare these with the 
actual probabilities for that year. This was done, using 
measured 1971 unemployment rates, and significant differences 
were observed. The adjustment was then quite simple. 


Let pj the observed probability in month i 


q; = simulated probability in month i 
and py-q; = D; 
All that was necessary was to adjust the regression equations 
so that they generated p rather than q. Since there would 
be a different D; for every month, and since there is a 
dummy variable in each month, the dummy variables were all 


increased by the relevant difference. That is, 


if r; is the original dummy variable coefficient in month i, 
and s; is the new dummy variable coefficient in month i, 


then S, = rj OP D;. 


This adjustment was carried out for every month and for all 


age-sex groups. 


a : +r 7 . 
is? ¥ 7 i is | i a 
A nl -_ " 4 Meas Ve 
ne he <a * tt 


wins Spee fo ati aint ll 
sh boen | mr? teve The alt fer, | 
idan! yeu he A905 b@ eghbont, shila 
bn Galauin\ reraney Adee, eee oehige 8 
ite hs Sah @thind> 2Bp tet’ de He aye vii a 
“Yodan . ol) Wor wile \ peey Same Yost — 048 ie: 
WrneseVilh 4Aqgiplenris. tew yo@nes, perry Chiglals' Yeu eee 
Sey JO DIA ae tae 2G NaE an uit | oun 


> te 
: : 
ns ui wipe J ,4acs )*] Lavras G@4f .@ gt sar 
t fins Al ‘ofSjtedorg Hel \tandea «a °F ; 
i> s gv 25 ne 7 
nag? PAL Liaeeaees Aa. rah ( Sé cr! hee Tifsarsen eawW Jana ifs 


um Opel wrelh .p nese setead Y beseasies yewts inett. 6@ 
6 62 SAR @7640 hee. ihiicet Yeers 36) ;2 sims ara oe 
Tie Seow SeitiJase Pond els , iti Anse al wtdaf oe (ane 
,04 pia? .aeaqsubh bib sievalea ois gt Sugeweg 


a | | 


a Deter 00 GnetSiView) Gideizas geeab takigizn tial 7 


ot Avwws ws dralselTiooe videiimy ooh won-eds ae ive 


as j i 
‘ ‘7 4 ra aa )* tarts 
sia ent fam tithe View s0% dép.dul+= we dfippatiw fh hs Gly 


, «qu0ep Xo “hg 


Two sets of regression coefficients thus existed, 
and they were compared by using each set to simulate the 
Perto”d from April 1972 to April 1973. ‘The adjusted feqnations 
performed much better as can be seen by referring to the 


Validation Section that follows. 


f.5%5 Sojyustment for Class of Person 


The Denton-Dawson probabilities apply to the whole 
population. POLSIM, however, distinguishes two kinds of 
individuals. "Class A" individuals are assumed to be such 
that they never enter the unemployment state. They will 
always be either in NLF or employment. "Class B" individuals 
will be subject to unemployment and will hence be the only 
ones who make normal transitions among the three labor force 
Seavces. Class B individuals must thus absorb all of the 
unemployment states, while Class A individuals will absorb 
a large proportion of the employment states. Since the 
transition matrices will now apply to Class B persons only, 
endesince we still wish to generate the same totals in) each 
state that we would generate if they applied to the whole 
population, it is necessary to adjust the Denton-Dawson 
probabilities upwards and downwards, as the case may be. 


We proceed as follows: 
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ie Assume the initial population is given as follows: 
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total number of employed individuals at time t. 
number of Class A individuals employed at time t. 


= nuitber of Class B individuals employed at time t. 


U(t) = number of individuals unemployed at time t. 


$1 


number of Class B individuals unemployed at time t. 


hice) = number of individuals in NEP at time t. 


numbe 


va 
Re 

ct 

Wu 


rou Glass A individuals in Noe at. cimemce 


No (t) = number of Class B individuals in NFL at time t: 


2% Let Pi; 
where 

a. Let Di. 
where 
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Matrix we are inte 


designated in the 


probability of moving from state i to state 


ul 


j as calculated by Denton - Dawson 


tly Sey sor Ne 


= adjusted probability of moving from state 


i to state j 


, we Can State that the adjusted transition 
rested in will contain some zeros. They are 


following matrix: 
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There are thus 13 probabilities to be calculated: 
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De To derive these unknown probabilities’ from the exieeing 


Denton-Dawson probabilities, it may be noted that the total number 
of people moving from one Denton-Dawson state (E, U, N) to another 
Denton-Dawson state (E, U, or N) must be equal to the same number 


of people moving between the same two states in the new model. 
TAs : (a) Employment - Employment 


total number moving from E to E in Denton-Dawson Pook (t) 
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(d) Unemployment - Employment 
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We also can see from examining the transition 


Mecrix that 
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6. We thus have 11 equations in 13 unknowns. By sub- 
stituting equation (10) into equation (1) ana equation bias 
equation (9) we eliminate the unknowns Pe 6, ape Pr. We can 


Ae 
then write the remaining 9 equations in 3 independent sets. 
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Ts Tt can be seen that the equations in Set #2 determine 


five of the unknown probabilities explicitly. 
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The transition matrix thus looks like: 


Year t+1 
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Be From Set 1 we can determine the 3 unknowns in the top 2 
rows. To do this it is necessary to specify from some outside 
Bounce the probability p' 
e,e 
Wea. 
Be Similarly Set 3 determines the unknowns in the bottom 
2 rows. Here it is necessary to first specify Pa A 
itp 
2", For age classes less than 65 years it is reasonable 
to specify 
p! = 1.0 emp p' = 0.0 
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[De i A not unreasonable hypothesis is that 


Pate = 0.0 for all age classes 


This in effect means that we allow no transitions for Class A 


individuals from NLF to employment. In fact, we assume that 
Ny Peoeanaclally, isve.; the number of Class-A -imdivigduals in 
har initially is zero. This means that "Class A" individuals 


are defined (initially) to be people in "preferred" occupations 
Say. A person not in the labor force would be just that: we 
wouldn't distinguish "preferred" NLF from "ordinary" NLF. The only 
Deople toi ever enter Ny would then be Class A people who retire. 
And, they will never return to E,- This procedure ensures some 
consistency between the assumptions here and the assumptions in ll 


directly above. 


12 ff the assumptions discussed in 10 and Il are made, the 


following two matrices can be derived: 


(a) for ages < 65 


e1 e5 u ny n. 
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(b) for ages > 65 
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given age-sex group, the existing probability matrices can 
be transformed into the above "adjusted" matrices. (In 
practice only the first row of the existing 3X3 matrices, 


which corresponds to the e, row in the adjusted matrices, 


2 
will be altered. The e, and ny rows in the aajustea matrices 
are really deterministic, and hence the transitions they 
imply can be carried out without recourse to random numbers 


and the probability matrix.) 


14. In principle, the adjustment parameters defined 

above apply only to a given time period. They depend on the 
total number of Class A and Class B individuals employed at 

a given moment in time, and these can be expected to change 

as time progresses. For the present, however, we assume 

these changes do not significantly alter the relevant parameters 
(E/E. and E,/E,)4 and hence the latter will remain constant 


Over the whole period simulated. 
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poset) iheuDisaggregation \of ‘Transition Matrices 
It will be recalled that the transition matrices 

in the Denton-Dawson model are stratified on age and sex 
alone. This entails no particular problem, provided that we 
do not attempt to assess distributional consequences beyond 
the age-sex level. If we are interested in finer distributions, 
however, the Denton-Dawson model loses much of its usefulness. 
Regional implications, for example, are of particular interest 
in Canada. But the Denton-Dawson model does not distinguish 


between regions. 


itsthus: becomes important to try tor further divs— 
aggregate the Denton-Dawson matrices. Broadly speaking, 
what we would like to do is select any particular age-sex 
Matrix, and from it produce 10 matrices (one for each marital 
Status-region class). These new matrices would hopefully 
better reflect the particular classes to which they belong, 
while at the same time not changing the aggregate age-sex 


distributions they would produce. 


In principle, there does not exist any ideal way 
tomnandle "this "problem.  Iteis simply not (possible totcreate 
detail not adequately represented in the original Labour 
Force Survey data. Any attempt to disaggregate data to 
levels finer than those actually measured will at best yield © 
approximations. It is the accuracy of these approximations» 
that either justify or refute the method. Having said this, 


we proceed. 


Suppose that a transition matrix (for a “given age 


and sex) is give by: 
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This matrix produces distorted distributions with respect 


to marital status and region that we would like to correct. 


We can define a new transition matrix as: 


' = ae - 
Pp Pea oP Dee ose Pon 
Pads (S58 Puu 7 “AP Pun 
Phe ae (SPs 


Where Ap is unknown, « is some constant (say .1), and the 
See abe thesame sas) ain the original. matrix. 

That is, we specify that our new matrix is to be 
such that the first element in the first row is increased 
(or decreased) by a constant amount, and the first element 
in the second row is to be increased (or decreased) by some 
fraction of this amount. The second element in each row is 
then to be adjusted so as to make the sum of the first two 


elements the same as it was originally. 


Let us now assume that p' is a matrix that produces 
exact results for a given marital status and region class. 
That is, for a given age-sex class, it is being postulated 
that a “good” matrix for one ,of the marital. status sreqmon 
classes can be obtained by slightly adjusting the original 
matrix in the way outlined above. Similarly, a "good" 

Matrix for another of the marital status-region classes can 
be obtained by a similar adjustment, with a different 


parameter (a different Ap). 


The problem is now as follows. Select a particular 
age-sex group, and hence a particular matrix. Select algo 


one of the ten marital status-region groups. For the above 
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assumption to be true (the assumption that the new matrix 
will give exact results) there must exist a Ap (ts) Such stat 
for the given marital status and region the actual number of 
employeds in period t will equal the number of employeds 
Simulated by the new matrix in period t, and the actual 
number of unemployeds in period t will equal the number of 
unemployeds simulated by the new matrix in period t.. “The 
problem is simply to find Ap(t) for the particular marital 
status, region, and month chosen. If we repeat the process 


ZOL every Marital status and region, we will generate 10 new 


Matrices (2 sexes x 5 regions) for every one of the old 


ones. And hopefully these will produce accurate simulations 


Onrad lst : 1 
Our variables: age, sex, marital status,.and region 


fhe derivation of Ap(t) now follows. 


number of actual employed persons at time t. 


ine Let E(t) 
(for a given marital status-region group). 


Ey ft) —- new total simulated at time t (from new matrix) 
E,(t) = old total simulated at time t (from original matrix) 
U, (t) = number of actual unemployed persons at time t 


(for a given marital status-region group) 


Uy (t) = new total simubated sat time t 


old total simulated at time t 


Ug (t) 


The problem is to find Ap(t) such that E, (t) = Ey (t) and U,(t) = 
Uy (4) Tor sade. 


Ze We can write: 


HN} 


(epee (trl) = tp 2rep (ey) Bye) (Dy pete eh See 


tae ad 00! 4 4 as 
rita,” eer: [ee 
? gedaan qoae Wegpper: wal ft sen 
- Fa) db ioiap TM rs aes lt G e? 
Fills RP ho! a gee m, erry | on 4 
vec Dn i all A r 


ature hal . berahin Tadbees pay »ae> \a 


nk 
i ; 


: | | a — 
ensricst Gen Pree PH yer Foals 
ar 
a - 
| a 
[380 syn ee hile re sero Ve 1 OCS o (sh Jad " 
ligigtt) &t-Epae*slgeg i pat loun yavin @ — 
j a at 


ae 
ic so6 pine Minap) i> teeta se pviliittels (nz2ae ma @ 12 re 
ita bedivicn 33) 2 ieet) 4g IctAlinto budau hie « 3) a3 
; : 


Wis 7h enxe1ae tayeop@dy Ine7ea Io spas 4° TRY 
vary dulpei-nisard' lation pvtp a navi , 


S$ awe? ga hegelumia lyncd _ ‘sou 


+ eri Sp hatolasvin (aac! ‘Phe eG) all 


> at A rhe “ wis Jan) F4 lel ts bona 54 al B idoag att 
‘3 bbe ‘wou (ig 


: 
5 ; 
of ry 
: ' 


’ ’ } 
,. | .@le ar oh 


eS | : 
7 Tied | . a 
ihe ¢ UN isi are, Cave Wanght ishing, a) « (ead 1 


oe es 


By) = Ey ft) = Measured employed on initial year tape 


Uy) 


! 
ee 
rH 
it 


measured unemployed initial year tape 


Equation (1) then becomes 


Ey (4) PooEy (1) tP Uy (1) tp, N (1) tAp (1) (Ey (1) t2Uy (1) ) 


Or E, (2) = E, (2) + Ap (1) (Ey, (1) +*0,,(1)) 


Eqa(2) - E, (2) 
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Bre For t = 2, 3, ... we can write equation (1) as: 


Ey (ttl) = (Pee + APIEY(t) + (Pue + “Ap)Uy(t) + PyeN(t) 


Setting E(t) = E,a(t) 
Utes Cate) 


Bert). J E, (t+1) 


we have 
Betta t) = Pe oE(t) + piel, (t) + p,aN(t) + Bpen (Er ape) 
We can write E,(t) as Eg(t) + Ea(t) - Eg(t) 


and Unt) = U.(t) + Ug (t) - Ug(t), which then “gives: 


Batetl) = Peeka(t) + Pues () gad SG bie (PeetAp) (E, (t) -Eg (t)) 
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= Bo (eri) + Pee (En CE) Bae) Le Oar ee Ua e)e) 
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Now because we force the total actual labor force to 
equal the total simulated labor force for purposes of validation 
(thus ignoring any errors caused by leaks to the non labor force) 
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Then, Eq(t+l) = Eg(t+1) + (pee-Pye)8 + Ap(Eq(t) + <Ug(t)) 


or Dee tes Gok) a Ba Ciel) = ipa =pee ie 
Ey (t) + “Un (t) 
A. It remains to explain the meaning of this equation. 


First of all, we can note that there are two reasons why the 
simulated Lotals differ from.the actual totals. The firestads 
Simply that the transition matrices, the Dae are incorrect. 
The second reason is that as the simulation proceeds, the 
transition matrices are multiplying state totals that are 
incorrect. They operate on the totals produced by the incorrect 
ee Pacner than the actual totals: 

In explaining the derived equation it will simplify 
matters if we assume En is larger than Eq. That is, we are 
simulating too few employed people and too many unemployed. Con- 
Sidering the numerator, the term EB, (t+1)-E, (t+1) is simply the 
total deficiency of employed persons being simulated. Because 
too few employed persons were simulated for period t (the 
deficiency being equal to 8), Poe? will be the deficiency 
in period t+l arising for this reason from the employment 


Eiransition. And similarly, Puce? will be the excess 
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employeds simulated in period t+l because too many unemployeds 
were simulated in period t. The term Pe oP eeie 2S Ens thestocat 
Gerttciency of employeds in t4+l arising from the creation of too 
many employeds in period t. The numerator is therefore the 
gericiency of employed persons in period t+1. that can be explained 


by the errors in the ae alone. 


Intuitively, one could more easily understand the equation 


for Ap(t) if it were given by 


E, (t+1) = Eq (t+1) 


A 
It would simply state that Bas for period t to ttl should be 


increased by the relative amount that the actual employed in period 


t+l1 exceeds the simulated employed in period t+l. But this would 


not be wholly accurate. This is because it would be putting the 

full burden of adjustment on the Paa term. And we wish to let Ene pis 
term absorb part of the adjustment. This explains the 2nd term 

in the denominator of the derived equation. The more that the 

second row can account for the deficiency of employeds (the higher 

are « and Un (e)) the less the adjustment (Ap) that falls on the 


errst row. 


The addition of this term to the denominator would still 
not make the intuitive equation correct however. We do not wish 
to adjust for the full error of actual persons over simulated ° 
persons, because we know that part of this error is a cumulative 
effect caused by previous errors. Thus it is necessary to subtract 


off this cumulative error, namely the (p,,-Pue)8 term. 
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5.4 Validation of the Activity Block 
9-4.1 Theoretical Aspects 
The validation of the Activity Block proceeds as a slight 


generalization of the principles elucidated in section 4.3.1. We 


will begin by mathematically describing the labor force Markov-process. 


We define the following variables: 


kay Xi. =( FE, UL N, J is the vector of labor force states 


atecimerc.. (tr = Orie she initial’ period) 


That asi, Ey = total number of employed people at time t. 


Gq 
il 


total unemployed at time t. 


s"EOtal non Pabor force at time &. 
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(9) P_ -=: |) Pee Peu Pen 


2 
Pue Puu Pun 
Phe Pnu Pnn 
= The transition matrix for period t-1 tot. 
yo = P,P. rene PL 
= the transition matrix from period zero to period ts 
Then, xX = XP a = XP 1 
Xo = XP, 2 XP iPS = X85 
xy a Pa i = XS, 
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Now if both X, and the S's were known exactly, and 


if the simulation process was perfect, then the simulated x. 's 


de 
would also be perfect. But none of these COndLtions hola. 
Rather, the simulated X's Can be written as 
eee ower R.) (xX A € 
t ie fe) : XQ) : 
= Pe ey coca oe Re iS 
tO - LO Cone Go y c 
= #« RX SrA -o€ 
ed eX, t pix, * RLAX, t rs 
Where, 
Ry = che error transition matrix from period zero sto 
period t. This is also called "parameter error" 
AX = the error in the initial labor force vector, or 
"Iaitial state error" 
e. = the simulation error. This error arises because 


we sample en a probability distribution using a 
Fandom number generator, it can be thougnt of as 

the same kind of error that arises when we toss a 
fair coin. We know that if we toss a coin 100 times, 
we should expect 50 heads and 50 tails. In fact, however, 
we know that if we performed the experiment many 
times, we would find that the observed number of heads 
was binomially distributed, with a mean of 50 anda 
variance of 25. We would hardly ever hit 50 right on, 


and the difference above or below 50 could be called 


the "simulation error". 
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In the expression for x, we Can Lonore the 


second order term, R AX, and write, 


pane 
! 

x 
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Aton ue = R + 
‘ + IONE Xo S, AX + € 


fe E 


HH} 


parameter error + initial state error 


+ simulation error. 


Ideally, we would like to distinguish these three 
kinds of errors in any analysis of simulation results. 
Unfortunately, this would be both difficult and costly. 
Since the errors that do result in the Activity Block are 
quite small (see below), a slightly different approach is 
therefore taken. 

The simulated results are compared with the actual 
totals in two ways. The first examines the total error. 
Simulated and actual values for both employment and unemployment 
are compared, and the percentage error calculated. These com- 
parisons are shown in the top rows of the tables in Appendix D. 
The total error thus presented is not really indicative of 
how well the simulation performs, however, since a large 
proportion of the error is the result of initial state error ein 
am ateempe to correct for this, an “adjusted” comparison is given 
in the second row of each of the tables. In this comparison, the 
simulated values are adjusted pro rata so as to add up in total to 
the actual measured values. That is, the simulated employed and 
unemployed totals are adjusted either up or down so that their sum 


will be the same as the sum of the corresponding measured totals. 


' | ay t 

” poe ee | 

7 ah | sare wot. | 

4 ay 7 < 

__ are aides eaiwoottabie yi “edit ee wil 
| tater wt (etita 76 heli nan ‘one: ate - f a 

oF joewin Bus Viawee’ sb ict oad dss tere ilo ch 


vip B4em eivat oy emeenh Solan Bala crane, wt 4 noni, 


y 
ee 


Sr, (oe) pp eoeset hss » min Fe g Pn inthe one} a i ootep! _ 
, anhed wrote | 
| . in, 2 “e _ 
IfPorn. 44%, tly frien eet Otlvnee pesto wee 7 ; 


fq faye oe epriecaw lard et coy ole ma aixtod 
eye Deane: id deka ine Med tot inhaler fsutnn — hose init 
ie. @ “ait imjeldstam tats" age? panseg wits bit: borage a] 
G vetuay § &i @2itis @ht 16 Oa ae ferj ¢ at puede w2a sce leag 
16 G7 \Wiagihai ytives.toq a) Sednne@ hp Sue wire Lato sit 
Sut4a. & G>LE | ta0VOvEN) | Betetver Hops lume wild Li iw wou 
(aria (@nta fens tm J ieese add ah Seven pels 79 nods zaqoag 
1} 2) Biase “Seow tia” ve . ebA? aD] Sueztes oF sentse ae 
Hi pate ae canta aif4 wi. .saddaydy 6 Ye dso Ao -— drones id “vel 
i pi Ode os Oo na ecde oe bavenpie ooh eectey Bete ieee’ 
? by¥etqus tednlavia aft ,u Sant wap tay heehee lauaeu wats 
right pout Re fomb ao gy Tati ko bavaviia o28 pu ayo lomons 


ice!) 04) oe PL ony? OS on 3h tHe was. Bh unen sip od Likw 


a fan Ss . a 


9 - He ag ! i 


<i lge = 


The adjusted simulation results calculated in this 
way represent a correction for the initial state error. The 
model population that POLSIM works with is somewhat smaller 
than the total Canadian population*, and comparisons between 
actual and simulated values are obviously going to reflect 
this. The adjustment process just brings both populations 
pone Same total Size, without changing the distributions 
within these populations. This does not correct the whole 
Oe che initial state error However. If the distribution of 
pmeminitaal population is also an.error, in addition to the 
Pe2e OL Ene population, then some \initial state error will 
moma. lnspection of the initial\distributions indicates 
Poet this Jatter kind of error does Sxist, but that it as 
small. Since correction would involve some rather tenuous 
assumptions, the effects of this error are ignored in the 


present analysis. 


The difference between the adjusted simulated 
totals and the actual totals are thus for the most part a 
result of the combination of simulation error and parameter 
error. Since the sum of these errors are for the most part 


very small, no attempt was made to distinguish them. 
5.4.2 Validation Results 


The Activity Status Block was validated by simulating 
the thirteen months from April 1972 to April 1973. Simulated 


labor force aggregates (total unemployed, total employed, 


* Recall from Chapter 2 that the Survey of Consumer Finance 
does not include persons in institutions, Indians on reservations, 
or people who live in the Yukon or North West Territories. 
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total male unemployed, etc.) were computed and compared with 
the corresponding actual values as measured by the labor 


force survey. The results of this comparison are presented 


in detail in Appendix D. 


The validation can be summarized with reference 
Eeeecnergrapns in figure 5.3. The graphs illustrace ihe 
gifference between the results obtained using the original 
Denton-Dawson parameters, and the results obtained when the 
various adjustments to these parameters (see sections 5.3.2 - 
5.3.4) were introduced. The graphs also show how the simulated 


totals compare with the actual totals. 


The first four graphs plot the total number of 
unemployed persons, and show the cumulative and net effect 
of the various adjustments that were made to the parameters. 
The first graph compares the original simulation to the final 
simulation. The final set of parameters, it will be recalled, 
contains all three adjustments: the adjustment necessitated 
by tne distinction of Class A and Class B persons (tne “Ctass 
A adjustment"), the adjustment to account for structural 
shifts in the labor market (the "Calibrated adjustment"), and 
the adjustment necessary to disaggregate the parameters to 
region and marital status (the "Disaggregation adjustment"). 
It can be seen that there is a considerable improvement in 
the results obtained using the final set of parameters as 
compared with the original set. The average error with the 
original parameters was 9.4%. With the adjusted parameters, 


the average error was 2.3%. 
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The second graph compares the original simulation 
with the Class A adjustment. In the absence of simulation 
error, the two sets of parameters should yield identical 
results, and in fact they generally do. The only significant 
difference between the two simulations occurs in the base 
month. This is not a reflection on the adjustment process 
itself, which clearly "works" more than adequately. The 
Girterence in the initial month can rather be attributed to 
cue fact that the process that distinguishes Class A inaiyiduals 
hiscne base year population is itself imperfect, Some of Enoce 
whom we designate as Class A, and hence as always employed, are 
in fact unemployed in the base month. This is not a serious 
problem, however, because as can be seen from the graph, the 
Markov-chain process soon corrects for this small initial 


Encigene Ss 


The third graph illustrates the effect of calibration. 
A significant improvement can be seen to have occurred in 


every month. The average error is reduced from 10.4% to 2.4%. 


The last three graphs illustrate the disaggregation 
edjustment. Over all, as can be seen from figure 5.34, there 
is little difference between the results obtained using the 
calibrated parameters, and the results obtained using the 
disaggregated parameters. At the highest level of aggregation, 
the two sets yield virtually the same results. And this is 
as we would expect. But if we examine the totals for ee 
two marital status classes, we find that the calibrated 


parameters significantly overestimate the number of unemployed 
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married people, and Significantly underestimate the number 
of unemployed people. This is again as we would expect, 
Since the calibrated parameters do not distinguish between 
married and single people. The disaggregated parameters, on 
the other hand, do make this distinction. And as can be 
seen from the graphs, they effect a considerable improvement 


at this lower level of aggregation. 


Similar comparisons for different age groups, 
%egions, sexes, and the two marital status classes are given 


in Appendix D. 
ws Date and Inputs 


The activity block uses four different sets of 
data. The first set relates to the problem of moving people 
through school. The second set is used to move people 
through the three labor states (employment, unemployment, 
and non-labor force) and to determine retirement and participation 
in pension plans. The third set consists of parameters that 
are used to make various adjustments in the calculated labor 
force transition probabilities. And thei dast™=set consases 
of unemployment rates and other parameters that are exogenous 
input to the labor force transition probability calculations. 
All of the input data, with the exception of the unemployment 


rates, is listed in Appendix D. 
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Seow eenool Transitions 


ig 


Pech(D,U ;K,l) = PSC liyoseee 


ahis is the conditional probability that @ person 


igeaoce Group I, marital status state J and sex K will be in 


school in September, 


given that he was in state L in the 


meevtous April and that he will be in either school or the 


non-labor force in September. 


The indices are as follows: 


(a) 


ised et age. is. 14 6 if age is 19 
Z 5 ii 20-24 
3 16 8 25-29 
4 VE 9 30-34 
a) 18 10 35-39 
J = 1 if married 

2/286 mot married 
Kes Leaf imale 

2 if female 
L = 1 Grade 9 12 Univ 4 

2 Grade 10 1} Uniw 35 

3 Grade ll 14 Univ 6 

4 Grade 12 RS) nay oF 

5 Grade 13 16 Univ 8 

6 CAAT il Ly Univ 9 

7 “CAAT 2 18 Univ 10 

8 CAAT a) aS) Retraining 

9 Univ 1 20 Employed 

10 Univ 3 21 Unemployed 

1 Onay 3 22 NLF 
SOWnRerS) 


This data is derived from sets of transition, 


matrices compiled from various sources by 


Leroy Stone of Statistics Canada*. 


* Leroy O. Stone, "Preparation of Some Demographic and 
Socio-Economic Data Inputs", Statistics Canada Internal 


Report. 
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2. FGG(I,J,K,L,M) = FGG(10,2,2,18,18) 


These are a set of 40 cumulative transition matrices 
(18 x 18) that determine how a person moves from one school 
state to another. There is one matrix for each age (I)- 
marital status (J)-sex(K) group. L is the input state and M 
is the output state. The indices are as described under 
PSCL above. (The input and output states are the 18 school 


states listed under index L above.) 
Source 


These matrices were also derived from the data 


compiled by Leroy Stone. 
See Activity! Transitions 


The data in this set consists of regression coefficients 
from which Labor Force transition matrices are calculated. 
The regression equations were derived by Frank Denton and 
D.A. Dawson of McMaster University. They have discussed 


their methodology in several papers’. 


ilies The Raw Data 


The raw data consists of month-to-month transition 


matrices disaggregated by age and sex for the years 1959-69. 


The transition probabilities are for movements among the three 


D.A. Dawson, "Report on Data Sources Relevant to Simulation 
Models of the Canadian Adult Training System". 


F.T. Denton, "A Simulation Model for Month-to-Month Labour 
Force Movement in Canada", McMaster University Department 


of Economics: Working Paper No. 72-ll. 


3. F.T. Denton and D.A. Dawson, "Some Models for Simulating 
Canadian Manpower Flows and Related Systems", McMaster 
University Department of Economics: 


Working Paper No. 72-14. 
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states: employment, unemployment, and non-labor force. 
These data were derived by Denton and Dawson from the labor 
force survey and were used by them to derive the regression 


coefficients which are the basic input to the Activity Block 


Model. 
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These are the five regression coefficients used to 
calculate the seasonally unadjusted unemployment rate in 
month I from the adjusted rate for that month. The regression 


equation is the cubic trend polynomial: 


Vee oto ( (C (1,1) *0URATE (1)/100) "4" Ci i; 2) 454 


Cis) A250 Cds 4) A 5  CUl Oo 


where URATE(I) = seasonally adjusted rate in month 


VRATE (1) unadjusted rate in month I (as 2%) 


ands I Li torevan. 


? ETOrVPeb*. 


etc. 


Source 


These coefficients were derived by Denton and 


Dawson?*. 


te el. Denton and D.A. Dawson, "The OTA Simulation System", 
Report prepared for the Department of Manpower and Immigration, 
June 1971. 
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These are the regression coefficients for calculating 


the transition matrices. The regression equation is: 


P(I,J,K,L,N) See LO pvp Liye) + ACL yd pK yp ly ae) + 


A(i,7d,K,L, 13) *UBARtA(1I,J3,;K,L,14) *DELU 


Where 

fea tow Of the transition matrix 

tea COlumMn Of the transition matrix 

K = age of the person 
A LA 4 35-44 
2 15-16 a 45-54 
3 17-19 8 55-64 
4 20-24 9 65-69 
5 25-34 

Ls sex 
ab male 
2 female 

M = Calendar month from which simulation occurs, 


e.g. M = 1 means transition from Jan. to Feb. 


M Li means transition from Nov. to: Dee: 


UBAR = average unadjusted unemployment rate 


* 


(VRATE(R) + VRATE(R+1))/2 


where R = month from which simulation occurs. 


DELU = change in unadjusted unemployment rate 


VRATE (R+1) - VRATE(R) 


Source 


These coefficients, which were derived by Denton 


and Dawson*, are given in Appendix D. 
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‘Al The Calibrated A Matrix 


The calibrated A matrix is exactly equivalent €o 
the A matrix described in (3) above. The only difference is 
that the dummy values (values 2 through 11) have been adjusted 
so as to force the regression equations to yield exactly the 
observed probabilities for the year 1971. These new coefficients 


are also listed in Appendix D. 
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The following raw data exists from which private 


pension eligibility may be inferred: 


a. Pension Plan Members Totel Paid Wouke ns 
(000's) (000's) 
Male Female 

Nfld 29) 32 Be 25 Nig Ae 
PEL Sy rhe PSPS TI OX 
NB 46.9 Loney 220s 
NS 70.9 2459 Las 
PQ 549.0 STAs 2) Nive Gplke 
ONT 907.4 ope hone 2680. 
MAN 90S 3128 300 
SASK 60% 77 24.4 L265 
ALTA dae ise 5320 498. 
BC 173.4 63.0 W226 

De: Total paid workers by sex: 


Male = 4483, female = 2356 

where a "paid worker" is defined as “those workers 
employed in a situation where an employer-employee 
It thus excludes all unpaid 


relationship exists". 


family workers and the self-employed. 
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These data are derived from: Statistics Canada, 
"Pension Plans in Canada 1970", #74-401 and from the Labor 
POLcCe survey #/71-001. (March '73). 


From this data we wish to derive a province-sex 


participation rate distribution. 


Let W(I,M) = number of male paid workers in prov.I 
W(1,F) = number of female paid workers in prov.I 
WT (I) ss 


total paid workers in province I 


W(I,M)+W(I,F) 


Assume W(I,M) 


= a = constant for all provinces 
We Lig Et) 
= 4483 
2356 
aa Ses) 
Thenawii,;F) = Wr(il) = wr) 
l+a 29) 
and W(I,M) = WT(I)a = WT(I) 
l+a Heyes )e) 
Let P(I,M) = number of male pension plan members in province I 
P(I,F) = number of female pension plan members in province I 


Then the participation rates are given by: 


Reject My (i. 53) 
WT (I) 

RiIloF) = P(t,P) (2.9) 
Word) 


Using these relations the paticipation rates are as follows: 


Province (I) Ril MM) RCE, EF) 
NFLD Ca) 239 b20 
PEI (2) MEX), oR 
NB (3) 33 oe 
NS (4) ESP: oe 
PQ (5) <45 cae. 
ONT (6) nis 34 
MAN C7) -46 Peet 
SASK (8) 41 ow 
ALTA (9) Poh) Pcie * 
BC (10) .38 2D 


These may be compared with Canada wide participation rates: 


Total = .41, males = .465, females =) 
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6. Retirement Age for those in Receipt 
of Private Pensions 


The Statistics Canada publication, “Pension Plans in 
Canada 1970" (#74-401) supplies data on the "normal retirement 
age of people who have private pension plans. Statistics 
Canada defines "normal retirement age" as "the earliest age 
at which a member may retire as a right and receive immediately 
his full accrued pension without reduction, although it is 
not necessarily the age at which he leaves the service of 


Ere ssmployer”™. The data-re~given in-the-«tol lowing table: 


Age Males (%) Females (3%) 
60 or less LO. 3 28.6 
61-64 0.6 ars; 
65 TERS Syeye 
66 and over Dhos 8) Ibs 


The totals do not sum to 100% because some plans provide for 
optional normal retirement ages based on some combination of 
age and minimum service retirements. From this data we have 


to derive actual retirement ages. 


It can be assumed, to begin with, that most people 
in the “optional normal retirement age group” will actually 
retire somewhere between 60 and 65. Also, most of those whose 
"normal" retirement age is less than 60, or is between 60 
and 64 will undoubtably actually retire somewhere in the 60-65 
age bracket. Thus to determine actual retirement age, two 
arbitrary decisions must be made: (a) those with optional 
retirement ages must be distributed in the 60-65 age brackets; 
and (b) those whose "normal" retirement age is less than 65 


must also be distributed in the 60-65 age bracket. That is, 
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for males, 21.5% (10.3+6+11.6) of the population must be so 
distributed; and for females, 45.1% (28.6+1.3+15.2). must be 


PLstributced. 


If we distribute these uniformly, and assume that 
all those whose "normal" retirement age is 65 or greater 


actually do retire at 65, then we can derive the following 


table. 
Actual Retirement Age 
Age Males (2) Females (%) 
60 346 PEs) 
61 BEG Has 
62 3 <0 Hed 
63 Bn6 Le5 
64 Sar6 vg as 
65 82.0 62.5 
10020 100.0 


It must be noted that this age distribution applies 
only to those who will be in receipt of private pensions. 
For the others, who will receive no private pension, it will 


be assumed that retirement takes place at age 65. 


5.5.3 Adjustment Parameters 


1. CHANGE (1,0, RyL) = CHANGE 13,3799 2) 


This is the calibration term used to adjust the 
December-January transition probabilities. Since no dummy 
term exists for the December-January transition (because 
it would lead to matrix singularity) it is not possible to 
simply adjust the A matrix for this particular month. The 


calibration term must be added explicitly. 
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The indices are: 


ii >.rOw of matrix,.for which, term applies 

J = column of matrix for which term applies 
K = age group (see description of A matrix) 
ine sex: +L = males),.2-- females 

Source 


This data is derived by calculating the probabilities 
generated for the December (1970) - January (1971) 
transition by the original regression coefficients 
and then comparing these with the actual probabilities 


as obtained from the Labor Force Survey. 


Jee RALOe Lad) = RATIO. (27.9) 


These are the parameters that adjust the labor force 
Emaistevon probabilities to account for the Type 1) = ayven7 


aistinciion. The indices are: 
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1 for total employed/type 2 employed 


2 for type 1 employed/type 2 employed 
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The array applies to males only. 
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Source 


This data was derived from the 1971 Survey of 


Consumer Finance. 
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DADDY 201 pi pK) DADO (2,522) 


These are the adjustment parameters used to adjust 


the age-sex transition matrices so as to account for region 


and marital status differences. DADJ(I,dJ,K) is the parameter 


that accounts 


for the adjustment for marital status I, region 


J and month K (transition is from month K to month Ree 


fi 1 not married 
2 married 
Wee ae AE ane 
2 Quebec 
3 Ontario 
4 Prairies 
5 British Columbia 


fer Ee sities. ML een 


Source 


These parameters were derived from data for the 
year April 1972 through April 1973.) The Jaber 
force survey in each of those months gave data on 
the total employed and total unemployed broken 

down by region and marital status. Simulation for 
the same months gave slightly different totals. 
Comparison of the two sets of data gave the adjust- 


ment parameters, as discussed in the section 4.2.4. 
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5.5.4 Variable Parameters 


The following parameters are read in from cards. 


They define the particular year(s) being simulated. 
tis IMONTH 


This is the first calendar month from which a 
Ereaneition 1s to occur. Normally this month is April and 


PMONTH = 4. 


2s NMONTH 


ine total number of months in the. simulation. 
Normally one full year is simulated, April through April, 
and NMONTH = 13. This means that the simulation will’ contain 


NMONTH -1 transitions plus one input month. 


ae URATE (13) 


This is the seasonally adjusted unemployment rate 
vector for the 13 months expressed as a percentage. That 


is, ef UATE for month I is 6%, URATE (I) = 6.0. 


4. KYR 


This is the year being simulated. If the base 
year is 1971, the first year simulated will be 1972 (April 


1972 to April 1973) and KYR will be input as ao Dy Re 
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6.1 The Market Income Model 


Bote) Antcroduction 


The POLSIM model divides an individual's total annual 
income into four main components: employment income, property 
income, retirement income, and other money income. Employment 
income is income in the form of wages and salaries, military pay 
and allowances, and net income from farming, fishing, and other 
forms of self-employment. Property income is divided into two 
subcomponents: dividends and other investment income. Dividends 
are self explanatory. Other investment income consists, generally, 
of income from fixed face value assets such as bonds, deposits, 
and savings certificates, and all other forms of investment 
income. Retirement income consists of private pensions, super- 
annuation, and annuities. Other money income includes income 
from roomers and boarders, alimony, gifts, and income from any 


other source not mentioned above. 


The purpose of the Market Income Block is to update 
these income variables, on an annual basis, as the person moves 
from year to year through the simulation. Broadly speaking, this 
updating process consists of two general problems. The first is 
to assign initial component incomes to a person if he is "eligible" 
and if the particular component being examined was zero in the 
previous year. (A “component" of income is one of the four sources 
mentioned above.) The second problem is to determine transitions 
on each of the income components that were not zero in the previous 
year. Both of these two kinds of processes are generally 


handled by either a deterministic function, a time-invariant 
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stochastic function, or a combination of these. The exact 
meaning of "eligibility" in the various cases, and the way 
in which the several transitions are effected, will now be 


discussed. 
6.1.2 Overview of the Model 


A micro-flow chart of the Market Income Block is shown 
in figure 6.1. It can be seen that the block begins by reading 
and assembling all of the parameters that will be necessary to 
update an individual's income variables. All of the constant 
parameters, which consist of transition matrices, initial income 
arrays, growth factors, etc., are read in from a single data tape. 
They are then simply stored for use by the relevant subroutines. 
The variable parameters are the particular year being simulated and 
tie rate of inflation that is assumed to apply throughout, the Ssimu— 


lated period. Both of these variables are read in from cards. 


After reading in the rate of inflation, the program is 
able to calculate a consumer price index vector. This is simply 
the consumer price index for the 15 years from 1967 to L9ets) ie 
will be used to calculate money growth in certain components of 
a person's income. The CPI vector is the last parameter necessary 
to effect Gndividuall transitions. All of the other sets of data 
have already been stored, and the program is able to begin to 


read and process individual records. 


Before the income updating processes begin, the program 
checks to see if the individual is less than 14 years of age. 
Children less than 14 are assumed by POLSIM to receive no income, 


and consequently the income variables of all such children are 


not updated. 
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MARKET 


KYR & 
RINFL 
ON CARDS 


OUTPUT 
TAPE FROM 
ACTIVITY 
BLOCK 
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All other persons are assumed to be eligible for employ- 
ment, property and other kinds of income. The way in which the 
transitions on these kinds of income are handled will be discussed 
in Section 6.3 below. For employment and retirement incomes, 
which depend on one's relation to the labour force, a variety of 
processes exist. The exact transformations that these two sources 
of income are subject to depend on certain elements in the indivi- 
Eual State vector. That is, they depend on Just what "kind"™ oF 


person the given individual is. 


The main criterion for determining how a person's 
employment and retirement income will be updated is his "TYPE". 
If his TYPE is greater than 100 then he is a person who has 
entered the labour force for the first time in the current year, 
and he consequently must have an initial income assigned to him. 
His retirement income will be zero. If his TYPE is 3, then he is 
either a student or a full-time member of the non-labour force 
(and not retired). In the former case, an employment income must 
be assigned if the student worked during the summer. In the latter 
case, no employment income is assigned. In neither case is a 
retirement income assigned. If his TYPE is 4 or 5 then he is 
retired. His retirement income is assumed to be constant, and 
his employment income is zero. If his TYPE 1s 14 or 15 then he 
is a Class A employed person. He is assumed to be employed for 
the full year, and his employment income is determined by an 
annual employment income transition. His retirement income is 
of course zero. If the person is TYPE 24 or 25 then he is a 
Class B member of the labour force. The Activity block has 
previously determined how many weeks he has worked, and the 


Market Income block now determines any change in his weekly wage. 
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His employment income for the year is then the product of these 
two variables. His retirement income is zero. If the person is 
iyee 40, then this is his first year of retirement (he has reached 
Bae, 65), and he is not eligible for a private pension. In the 
present version of the model he is then assumed to have no 

Pravyate retirement* or employment income at all. If a person is 
ier 20 then he has just retired and he is eligible for a private 
pension. This pension is calculated and becomes the whole of his 


retirement income. His employment income is zero. 
6.1.3 The Income Processes 
i Initial and Student Incomes 


This process applies to all persons who must be assigned 
an initial employment income, and to all students with some summer 


employment. 


A mean full employment income in 1971 dollars is first 
computed deterministically, on the basis of the person's age, sex, 
education, marital status, and province. Real economic growth is 
then applied to this income, depending on an exponential growth 
factor. This growth factor is a function of the age, sex, and 
education of the person under consideration. Multiplication by 
the growth factor results in current year real income (the current 
wear is the year being simulated) expressed in 1971 dollars. This 
is then inflated by the change in CPI, giving current mean full 


employment income in current dollars. 


*This does not comprehend public programs such as CPP or QPP. 


_¢t es at a ca | a. oe 
pees i Py ‘Lian Letts a bt 
Kel Se Rinds ae ona iad) 
: sere 1h - Ae sie Wa v aif 


Sony t Uaeg’ ad $-iAn que, § pre \ Pde oe alld «i ake 
ree ets fis sigitite ' coy a2 iss eee diesicans Tet sated 
| | Be 
. Jiang 
‘ re ss — ; - : 
rr A, a a 
7 Ce = tir = De as 4 ie Ce | J Maer pe a hia ewe & | 
ise tag Oils FO “4 ly dd wit 10 4 kebasebtiage % ved ugiliel 
jiaw Gy 4) aie Fear: wyrelaeay Gre: oases ‘Sash fui ncokg naa 


on 


jeer) fi) PO 2 ere Way eect siishileast eedeael ket of taliqus abed 
1L? a> Ge) ee, oe Oe deadast a” al roteat hana T sinT- 2080s 

ry pl Liga Ligey on Nae Ssbignra wha; aweiey sna Io nos hot 

= wile} peg hae raw tE2uS fk tne 1035481 iLdwoap watt 


Aiw wat CPi itey is 17 ie ietipa ieraiuate: pus ! ni yaoy aad Bs 


a eer bully (aS wh opines srl ye ‘bosala remit at 


La er sateen, at) onan 4 icy & 
a ' ise =_— lori 


5 i 


‘O80 3 9822 nO vc acest ij 2tfigug ba sebsent nett fof eee 


coaet eo a 


The person's actual employment income is then calculated. 
If the person is a Class A individual, his employment income 
remains as the income just calculated. Since Class A persons are 
assumed to never become unemployed, no changes from the full 
employment income are necessary. If the person is Class B, then 
Zt is)possible that he may not have worked a full year. The full 
employment income is therefore converted to a weekly rate, and the 
individual's employment income becomes the product of his wage 
rate and the number of weeks he worked. If the person is a student 
he is treated in almost the same manner as a Class B person. The 
Susy daitemence: as» that'.a multiplicative: factor of «7 is introduced 
into the wage rate to account for the a eeeeeaee! between student 


summer wages and the equivalent wage paid to full-time « iployees. 
ois The Annual Income Transition Process 


The central feature of this process is an annual wage 
transition matrix, stratified on age and sex, which defines the 
probability of a person moving from one income class to another. 
The income classes of this matrix are defined in 1970 dollars. 
Since the person's income will be expressed in current year 
dollars, the first step in the transition process is to deflate 
the person's income to 1970 dollars. This is done by simply 
multiplying the person's income by the ratio of the relevant 
@Pl as Hit tSs<then mecessary to find the particular 1970) aycome 
Classitinto which the person falls, and his position in this ¢lass, 
relative to the lower bound of the class. Once this is done, the 
person's new income class is determined stochastically. A random 
number is drawn, and the new class is assigned depending on the 
value of this number and the cumulative transition probability 


distribution relevant to the particular initial income class in 
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question. The person's exact income in the new income class is 
then Calculated. He is assigned the same relative position in 
the new class ‘that he held in the previous year's class. ‘The 
income thus computed is still expressed in 1970 dollars. It is 
thererore inflated to ‘current dollars’ to refléct “the change in 


thesCPivover -the particular given period: 


3% The Weekly Wage Transition Process 


The ‘process here is virtually identical to the one 
described above. The only difference is that the transition 
Matrices' are defined by weekly wage rates in 1969 dollars.*™ The 
transition process from the old wage rate in current dollars to 
the old wage rate in 1969 dollars, and thence to the new wage 
rate in 1969 dollars and the new wage rate in current dollars 
is exactly the same, mutatis mutandis, as described above. Once 
the new wage rate is calculated, the person's employment income 
for the year becomes his weekly wage rate multiplied by the 


number of weeks he worked. 


Aes The Retirement Income Process 


The assignment of initial retirement income is quite 
straightforward. A probability distribution, contingent on sex, 
defines the percentage of last year's annual earnings that the 
person will receive as pension. This distribution is sampled 
and the relevant percentage determined. The person's retirement 
income is then just this percentage of his last year's annual 


earnings. 


*The reason why the weekly transition matrices are in 1969 dollars 
and the annual transition matrices in 1970 dollars is that no data 
existed on weekly wage transitions for the years 1970-71. The last 
years of data for weekly transitions are 1969-70 and the last years 
for annual transitions are 1970-71. (eof. SecElon Beau 
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is The Property Income Processes 


it will be ‘recalled that property income consists of 
two components, dividends and other investment income. The 
transition processes that apply to dividends are exactly equiva- 
lent to those that apply to other investment income. The only 
Gitterence between the two is that the probabilities defining: the 
two processes are different. But the processes themselves, the 
definition of the cells in the transition matrices, and the 
variables that the matrices are stratified on, are exactly the 
same. The discussion below will therefore be in terms of one 
process only: one applicable to "property income". As the flow 
chart in figure 6.1 indicates, however, what really takes place 
in the program are two sequential identical processes, the first 


for dividends and the second for other investment income. 


The updating of "property income" can be divided into 
two sections, depending on how much initial property income the 
person in question has. The first applies only to persons whose 
property income is less than $250 and embraces those persons who 
are moving from zero to non-zero property incomes for the first 
time. For people in this group, transitions to other classes 
depend on both age and income. A random number samples a cumula- 
tive probability distribution which is contingent on the person ss 
age and income. The sampling defines the new property income 
class, and hence the new property income (which is taken to be 


the midpoint of the class). 


The second set of transitions, which apply only to those 
whose initial property income is greater than $250, is virtually 


identical to the first. The only difference is that the transition 
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matrices depend on age only, rather than age and income. 

me 1dea behind this, <distinction sis that total income is 
relevant in determining the level sof |property income a 
person 1s: likely to, .achieve. in the first instance that is, 
tie larger. the person's total income, thewlarger this savings 
are Jikely tobe. ,And,:the, larger the person's savings <"the 
feroer Nis, initial property income... | Whethex)thissiniteiad 
amount of savings is then further built up or drawn down 


depends mainly on the stage of the life cycle, i.ée., age. 


6s The Other Money Income Process 


Other money income consists of room and board 
income, alimony, and other small items of income which are 
difficult to simulate. Because this kind of income is 
small, and confined to very few people, it is handled ina 
Simple deterministic way. The only persons allowed to have 
other money income are those who had some in the previous 
year. The size of the current year amount is increased, 


however, to reflect any changes in the CPI. 


De Total Income and Output of the Market Income Block 


Once all of the components of an individual's 
income have been determined, his total income for the simulated 
year is computed by simple summation. This total is then 


recorded in the individual's state vector. 
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6.2 The Market Income Block Parameters 


The following is a brief discussion of all of the 
parameters that are input to the Market Income Block. 
Section 6.2.1 discusses the estimation of the employment 
ancome parameters. Section 6.2.2 does the Same for the 
property income parameters, and Section 6.2.3 for the retirement 
income parameters. Appendix E should be consulted, inter alia, 
for the definition of each set of parameters and the detailed 


indices embodied in each. 


Section 6.2 as a whole is not concerned with estimation 
procedures as such. For the most part, the Market Income Block 
parameters do not consist of data estimated by regression techniques, 
interpolation, and so on. They are rather data (cumulative proba- 
bilities for the most part) derived from micro-data files compiled 
by Statistics Canada, the Unemployment Insurance Commission, and 
the Department of National Revenue. The discussion that follows 
is mainly concerned with indicating "why" a given set of data 
was compiled, rather than some other set. The computer programs, 
the adjustments to the raw data, and the other details of the 
compilation process itself are mentioned in only a peripheral 


way. 


6.2.1 Employment Income Parameters 


The existence of employment income as a major component 
in a person's state vector presents two problems for POLSIM. The 
first is the establishment of initial employment income for persons 


who enter the labour force for the first time during the course of 
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the simulation. The second problem is that of effecting transi- 
tions between income classes for employed persons as they move 

Peom Year tO year. The following discussion détails the way in 
which these problems are handled by the Market Income Block, and 


the parameters that are necessary to effect the various processes. 


Dg tTicome, transitions 


(a) General 


For purposes of employment income transitions, POLSIM 
divides the employed population into two mutually exclusive 
classes. We may designate the people in these classes as "Class A" 
persons and as "Class B" persons. Class A persons are those for 
whom the concept of unemployment has no precise meaning, (the 
self-employed for example), or those who are extremely unlikely 
to ever experience unemployment. More precisely, Class A persons 
are males who are either self-employed, or who are employed in 
managerial, professional, or technical occupations. All other 
employed persons are Class B persons. These are people who are 
likely to leave the employed state (for either the non-labour 
force or the unemployed state) once or many times during their 


working lives. 


The reason for distinguishing between Class A and Class B 
persons has to do with the obvious fact that employment income 
depends on the extent to which a person is employed. Employment 
income for a year is some wage rate multiplied by a period of 
employment. Annual income can thus change if wages change, if the 
period of employment changes, or if both of these factors change. 


The most general approach to income change would be to let both 
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determinants vary from year to year. But if there exists some 
group for whom employment changes are not meaningful (they can 
be assumed to be always fully employed) then it would be both 
more realistic and simpler in dealing with these people to 


consider annual wage changes only as the sole determinant of 


income change. 


This is the strategy that is adopted with respect 
to Class A persons. It is assumed that they never become 
unemployed and that Were income changes are hence determined 
solely by changes in their annual wage. For Class’ B persons, 
on the other hand, it is necessary to examine changes in 


both weekly wage rates and number of*weeks worked. 


The Activity Block deals with changes in annual 
weeks worked. (See Chapter 5.) The major problem faced by 
the Market Income block is to obtain the income changes that 
ave applicable to the two classes of ‘persons descr bed 
above. That is, it is necessary to obtain a weekly wage 
rate transition matrix that is applicable to Class B persons, 
and an annual employment income transition matrix that is 
applicable to Class A persons. To obtain these two kinds of 


matrices we make use of the UIC-DNR data base. 
(b) The UIC-DNR Data Base 


This data base was Originally produced itor the Unempioy— 
ment Insurance Commission to assist in the analysis of proposed 
new unemployment insurance schemes. It consists of data describing 
the demographic, financial, and employment characteristics of 2% 


of the Canadian working population. The data base was compiled 
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Bom two main sources: Statistics Canada and the Department of 
National Revenue. The Statistics Canada files consisted essen- 
tially of samples of UIC administrative records, with occupational 
and industry codes added by Statistics Canada, as well as survey 
Meta Collected as a joint UIC-Statistics Canada. project. ‘The DNR 


files supplied information from income tax returns. 


Persons whose SIN numbersends. ini fil4 tom AN a Soca. 
of approximately 250,000 individuals, comprise the UIC-DNR sample. 
Creating the data base consisted in a sequential matching of the 
two basic data files to the sampled individual. Individual files 
on the insured population, contributions paid and benefits received 
were obtained from Statistics Canada. These were merged together 
by matching SIN (if the data existed for the given SIN) to form 
one file containing all information received from Statistics 
Canada. At the same time the same sample of individuals was 
drawn from the DNR files on income tax returns (if a return existed 
for the given SIN). A final merge was then made combining the SIN 
master file sample, the Statistics Canada file, and the DNR tax 


returns file. 


The data base thus compiled contained the following data 
that was relevant for our purposes: demographic characteristics 
lage, sex, marital. status, province),. the various components of 
income a person might have (wages and salaries, business income, 
etc.), the number of weeks worked, and whether or not the person 
paid unemployment insurance premiums. These data exist for the 


7 years 1965 through 1971 inclusive. 
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(cy) Class B Person's and Weekly Wage Rates 


in constructing transition matrices for Class B persons, 
foots S1rSt necessary to identify the subset of the Ule={DNRedaca 
base that consists of Class B persons. Roughly speaking, Class B 
persons are those who are likely to experience unemployment. 
During the period for which the data exists, 1965-71, the insured 
population was approximately coterminous with this group: prior 
to June 1968 the insured population consisted mainly of all wage 
earners, and all salaried workers earning less than $5,400; after 
June 1968 coverage was extended to all‘wage earners and to all 
salaried workers earning less than $7,800. It is thus not 
unreasonable to derive weekly wage transition matrices from 
that subset of the data base which contains all persons who had 
Bre contrabution records in this period. The contribution record 
contains data on the number of weeks worked. The annual wage 
income for the person is taken from the DNR record. From these 
it is possible to calculate weekly wage rates. And, by obtaining 
the person's wage rate in two consecutive years, a transition 


Matrix can be derived. 


(ad) Class A Persons and Annual Employment Income 


If a person has a DNR record on the data base, but has 
no UIC record in any of the 7 years, then it is reasonable to 
assume that he is a Class A person. That is, he 1s a person who 
is unlikely to ever become unemployed. All self-employed people 
will fall into this class, as well as salaried workers with high 


incomes. All wage earners will be excluded. From the DNR record 
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for these sorts of individuals it is possible to derive employment 
income (which is the sum of wages, commissions, business net income, 
professional net income, farming net income, and fishing net income). 
And since employment income thus defined will exist for several 
consecutive years, annual employment income transition matrices 


can be derived. 


Annual employment income transitions for all persons are 
anus handled in a conceptually simple way. For Class A persons, 
annual transitions determine how their employment income from all 
sources changes. Class B persons are assumed to have employment 
income from only one source, wages, and this income is changed 
by determining transitions in both weeks worked and in weekly 


wage rates. 


(e) Stratification of the Transition Matrices 


Stratification is the process wherein a body of data 
Hs grouped into a number of disjoint classes. In the case of 
income transition matrices, for example, it is desirable to have 
different matrices for different age groups, sex classes, regional 
Glasses, and so on. If income transitions are significanely 
Gafterent for different subsets of the population, then strariti 


cation will yield far more realistic results. 


Unfortunately, the use of stratification variables 
ereates a dilemma. On the one hand, one would like to stratify 
on all variables that are significant in explaining differences 
mm the data. But if this is done, it usually turns out that the 


number of cells one ends up with is so large that the resulting 
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distributions are statistically meaningless. For example, if 
we are constructing transition matrices with 10 income classes, 
end Wi we wish “to'stratify ion tages (5 classes, ay) ;.maritgal 
meawus C2) province’ (10)); and ssex:o(2) » then we! willehave 
20,000 “cells antowhich ‘our data "could fall + (40x10x522x10x2).. 
Since the number of records available for computing weekly wage 
rate transition matrices is approximately 100,000, it is clear 
that most of these 20,000 cells would contain very few, if any, 
observations. Obviously, then, we have to restrict the number 
of stratification variables if we are to have any confidence 


at all in the statistical veracity of our derived matrices. 


The question is, how much restriction is necessary? 
The table given in Appendix E indicates that to ensure reason- 
able statistical reliability, we need at least 100 observations 
foe any row’ of “aigiven*matrix. This numbersis|strbetly <cornect 
if we are dealing with 2x2 matrices. For larger matrices, larger 
numbers of observations would be required. It is not necessary, 
however, to have this many observations in every row of a given 
matrix. What we desire is some confidence in the matrix as it 
will be used. And to achieve this, we require that there be a 
reasonable number of observations in the rows of the matrices 
(or even the cells of a given row) that will apply to the great 
majority of people. The fact that there are very few observations 
on people moving from very high wage rates to very low wage rates 
need not worry us very much. What we do want to ensure, howevers 
is that there are a reasonable number of observations around the 


diagonals of the derived matrices. 
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The above paragraph abounds in such obscurities as 
preasonable”, "larger", "“somé°-confidénce™, dnd'so one” "The problem 
msothat it’ is very difficult to*come up with some precise statis-— 
tical measure of how adequate a whole matrix of observations is. 
The method outlined in Appendix E gives at best a very crude idea 
Sr. owiatekinds ofenumbérs to look’ fort “In additiony we do not in 
fact wish to assess the significance of the matrix as a whole. 

To reiterate, we do wish to have confidence in the transitions 
that apply to the large majority of people in the simulation. 

And this means that we want to look for a "reasonably large" 
number of observations around the diagonals. Selecting strati- 
fication variables is thus as much an art as it is a science, and 


muerrationalesset out below reflects this fact. 


ftewas first “assumed “that *the ‘critical stratrvftrceation 
variables were age, sex, and region. Transition matrices stratified 
on these variables were then derived. Five age classes were chosen 
(14-24, 25-35, 36-45, 45-64, 65+), and combined with the 5 regional 
classes and 2 sex classes yielded 50 matrices. As expected, some 
of these matrices were so sparse as to be meaningless. The problem 


was then to reduce the number of stratifications. 
(f£) Weekly Wage Rate Transitions (Class B Individuals) 


inspection) of 'the data andicated: “that all three of tie 
stratification variables were important. That is, the Praned tee 
matrices were different for different age classes, different sex 
Glasses, and different region classes. The differences were what 
one would expect a_priori. Males have higher probabilities of 


increasing their wages than females. Younger people have higher 
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probabilities of increases than older people. And people in 
Ontario, for example, have greater chances of increases than 


people in the Atlantic provinces. 


There is thus no obvious rule by which to eliminate any 
Si gene Stratifications. All that can be done isto eliminate the 
least significant ones. There ave very large differences between 
males and females, and so these must be kept. This reduced the 
choice to either region or age as the variable to be eliminated. 
Of these two, age is much more critical. People in the 14-24 
age group, for example, tend to have many more increases in 
wages than those in older age classes, where wage rates tend 
to be more stable. Differences between regions are not nearly 
so marked. Since it was necessary to eliminate at least one of 
the stratifications, the choice thus fell to region, as the least 
critical of the 3 possibilities, and this stratification was in 


fact eliminated. 


Within the age stratification, it was found that the 
65+ group was virtually empty. It was therefore decided to 
aggregate this group with the 45-64 group. The final stratifi- 
ation then consisted of 8 disjoint classes: two sexes and 4 age 


groups (14-24, 25-35, 36-45, 46+). 


(g) Annual Employment Income Transitions (Class A 
Individuals) 


Much the same behaviour was observed when annual wage 
matrices were compared. The conclusions reached by inspection 


of annual wage matrices were then applied in the construction 
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of annual employment income matrices. The reason for this pro- 
cedure was that the disaggregated wage matrices already existed, 
whereas the corresponding employment income matrices did not, and 
it was hence not possible to directly inspect the employment income 
matrices. The disaggregated income matrices could have been 
produced, but because of the high cost of doing this it was decided 
to produce only the final aggregated set. Since wages are the 
largest component of income, it is unlikely that the conclusions 
thus reached would have been any different had the complete set 

of income matrices, stratified on all 3 variables, been produced 

as well. The only difference between the weekly and annual data 
was that for the annual employment income transitions (which are 

to apply to Class A persons) there were very few observations on 
the 14-24 age group. This group was therefore aggregated with 


the 24-35 age group. 


The final stratification for annual employment income 
change then consisted of six classes: two sexes and three age 


groups (14-35, 36-45, and 46+). 


(h) Time Series Analysis 


The UIC-DNR data base provides 5 observations on weekly 
wage rate transitions (1965-66 to 1969-70) and 6 observations on 
the annual wage rate transitions. Given this data one could 
proceed to examine such questions as the degree to which the 
matrices vary with time, the extent to which changes can be 


explained by inflation or other macro-variables, and so on. 
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Unfortunately, these kinds of analyses are beyond the 
scope of the present study. What has been done here is to con- 
struct transition matrices that represent changes in real income 
(1969 dollars in the case of weekly matrices; 1970 dollars in 
the case of annual matrices), and it was assumed that these 
MaGrices are time invariant. These matrices were fobtained for 
the final year of data in each case (1969-70 for weekly, 1970-71 
for annual). The truth of the time invariance assumption remains 
an empirical question, one that hopefully can be answered in 


future work. 


(1) Inflation and the Construction of Income Transition 
Matrices 


The procedure adopted was to deflate the higher year's 
incomes by the change in the consumer price index. Consider, for 
example, the case of weekly wage rates. The raw data here con- 
Sisted of a record containing a person's money income in 1969, 
and his money income in 1970. (As well as the stratification 
Variables, age and sex). The CPI in 1969 was,125-5 (1961 =-1007) 
and in 1970 it was 129.7. The person's 1970 income was therefore 
Hiviced by 129.7/125.5 = 1.033, to obtain his 1970 iamcome, an -19G0 
dollars. His 1969 money income and his deflated 1970 money income 
then determined the cell of that matrix that was to apply to him: 
All individuals were counted in this way, thus deriving the required 
matrices. In the same manner for annual transitions, 1971 income 
mas deflated by dividing it by 1.0285. This enabled 1971 incomes 


to be expressed in 1970 dollars. 
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The matrices thus derived have a very specific meaning. 


In the case of the weekly matrices; we can express this meaning 


as follows: 


Let Pij (a,s) be an element of a given matrix derived 
as described above; 

and let a person's income in any year t be such that 
Hii itlassexpressedtin 1969 dollars’ at wild 2all ante 
income class i; 

Then if the person is in age class a in year t, and 
Sexrchassis, Pijje (ays) sisethetprobabilieyeo£ moving 
to income class j in year t + 1, where income class 


j is defined by limits expressed in 1969 dollars. 


(j) The Income Transition Program 


After a person's state vector has been read, and it has 
been decided that he will make an income transition, the process 
proceeds as follows. (The description is for weekly wage rate 
transitions. With the requisite changes, the process for annual 


change is identical). 


(7) The person's age and sex class are determined, thus 


defining the celevant transition matrix. 


(ii) The person's wage is deflated to 1969 dollars. 


(i343). They person! Ss» wage classiis then determined, and his 


position in that class, relative to the maximum 


income in the class, is noted. 
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(iv) The income transition (via random sampling of the 
relevant row Of ithe transition matrix) thenaderines 


his wage class in year t + l. 


(v) His wage in year t + 1 (in 1969 dollars) is determined 
by placing him in the same relative position in the 
new class that obtained for him with respect to the 


Gla class in vear -t.. 


(vi) This wage is then inflated to current dollars in 
year wu + 1. {The anfilation £acteom in Steps suit wand 


(vi) is the change in the CPI). 


The assumptions embodied in the above procedure are that 
there exists a time invariant transition matrix explaining real 
changes in income from year to year, and that money changes in 
income can be described by a multiplicative change in the deter- 
mined real incomes. The multiplicative factor is assumed to be 
defined by changes in the CPI. This same dichotomization of real 
and money incomes is also assumed with respect to the assignation 


of initial incomes discussed immediately below. 
os Initial Incomes 
(a) The 1971 Data Set 
The problem of “Initial Incomes” 1s very easy to formu- 


late. The POLSIM model will cause, each year, certain individuals 


to enter the labour force for the first time. These people will 
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be, either students graduating from school, or people such as 
housewives who leave the non-labour force and obtain employment. 
These sorts of people will have no income at all (or a very small 
income obtained from part-time or summer employment) in the year 
previous to the one in which they became full-time labour force 
Participants. It would therefore be unrealistic to apply an 
ordinary income transition to these people, since these transitions 
are meant to apply only to people who are full-time labour force 
participants in both years being considered. What is necessary is 
to assign to new entrants an initial income that takes cognizance 
of their age, sex, education level, and perhaps other factors as 


well. 


The income that is to be assigned would be a weekly wage 
rate in the case of Class B persons, and a full employment annual 
income in the case of Class A persons. "Income from employment" 
in the latter assignment is defined as the sum of wages and salaries, 


and net income from self-employment. 


Data with which one can solve the initial income problem 
ieefar from ideal. What one would like to have is a Joint distri 
bution of the incomes of first-time labour force participants 
cross-classified by all of the relevant individual characteristics. 
Unfortunately, such data simply doesn't exist. One is thus forced 
to examine all of the data that does exist, and piece together as 


large a joint distribution as possible, on the basis of reasonable 


assumptions. 


Richard Arnott has carried out this exercise, as part of 


an Education Finance Study undertaken by the Institute for Policy 
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Analysis at the University of Toronto. He calculated a distri- 
bution of 1971 mean full employment incomes cross-classified 

meade (97 categories) , sex (2), .education, (10), marital status 
(3), and province (10). More specifically, his data is ckoss= 


Classified as follows: 


(a2) Sr years Of age (ages 14 through 70) 


(ii) sex 


(iii) 10 education classes: 


no schooling 

some elementary 

elementary completed 

some high school 

high school completed 

some university 

community college graduate 
Bachelor's degree 

Master's degree 


PHD 


(iv) Marital Status 


single 


married 


other 


(v) The ten provinces 
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This is a very extensive and useful set of data, parti- 
cularly since it breaks education down into such fine categories. 
tne only difficulty with it.astthat iteappliessonly to the year 
1971. Since the POLSIM model will operate for years subsequent 
Po 19/1, itedseanecessary) tovadjustvthe data toi reflecteinflation 
and economic growth. The whole procedure for assigning an initial 


income to a given person will then consist of the following steps. 


(1) An individual's 1971 mean full employment income 
will be calculated on the basis of the Arnott 


data; 


(ii) This income will then be adjusted to account for 


inflation and economic growth; 


(iii) If a person is designated as Class B, then this 
income will be divided by 52 to give a weekly 


wage rate. 
It remains now to explain step (ii) above. 
(b) Inflation and Growth 
Conceptually, this problem is quite straightforward. 
We begin with the person's 1971 income in 1971 dollars. What 


we want is his 1975 income (say) in 1975 dollars. We proceed 


as follows: 
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Let g(a,e,s) be an exponential growth factor defining 


the yearly growth in real incomes for someone in a 


particular age, education, and sex class. 


; He 
Then if YR 
income 
aoe ote 
YR = 


is his real income in 1971, his real 


(1971 dollars) in year 1971 + t will be 


We now have the person's real income in the required 


year, 


but expressed in 1971 dollars. If CPI(t) is 


the consumer price index in year 1971 + t, the person's 


money income in year 1971 + t will be 


yi + t 


R x VCPICE) 


Cra 0) 


Estimation of Growth Factors 


The only problem we have yet to deal with is the 


ealculation Of the growth factors. And again, this 2s a rela-= 


tively simple problem. 


From the 1967 and 1971 surveys of Consumer 


Finance, we can obtain mean full employment net employment incomes 


cross-classified as follows: 


(1) 


age 


14-17 


trae f 


22-28 


2o= 25 
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(ii) sex 


1k ae) education 


less than grade 9 
less than grade 12 
orade 12 or 13 
some Univ. or CAAT 
CAAT Or Univ. Grad. 
Post Graduate 


We thus have yo (a, S72.) and vo (a,s,e) where See (a,s,e) is the 


67 dk: te 
mean money income of all persons in a given age-sex-education class 


an. year t. 


Now define real incomes as follows: 
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The growth factor for the given class is then defined by 
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6.2.2 Property Income Parameters 


The property income transition matrices were also derived 
from the UIC-DNR data base. The DNR record on this file contained 
data both on an individuals dividend income and on his income in 
the form of interest and returns from other investments. This 
data exists for the three years 1969, 1970 and 1971. It was thus 
possible to obtain two observations on single year dividend transi- 


tion matrices and single year interest transition matrices. 
aA Aggregation of the Time Series Data 


Two observations are not enough data to make reasonable 
inferences concerning such underlying determinants of property 
income transitions as the state of the business cycle, the rate 
of inflation, and so on. It was therefore decided to aggregate 
the two observations in order to reduce small sample error and 
randomness in the data. The aggregation process consisted simply 
@,-acgding the humber of counts in any given cell of the matrix 
for 1969-70 to the number of counts in the same cell for the 
1970-71 matrix. We can make the assumptions inherent in this 
process explicit: Let (Dy Por --+-D,) be. any .row of the 1969-70 
matrix and let N be the .total number of counts, in..that. sow. Tien 
(Np), NP yy --+-Np)) ig the wector of .counts. for senate pastiomlarg 
row. Similarly, let (qys Aor -++-d)) be the same row of the 
1970-71 matrix, and let M be the total number of counts in that 
row. Then (Mq, + Mq51 -++-Mq) is the vector of counts for that 
tow. . Let (Sy, Sos --+-S)) be the derived row of the aggregate 


Bransition matrix. 
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Thietsy 2S, Si is just the weighted average of the original 
probabilities, where the weight depends on the relative number of 
counts in the original rows. If N is very small in comparison 
with M, for example, then very little weight will be attached to 
PteevOI-70 probability. And this is what we wish, since Lf N 
ee small itjas likely’ that. there will be large errors) in) the 


p,'s as compared with the errors in the q,'S. 


Re Stratification 


The matrices were originally made dependent on both age 
and income. There were 4 age classes (14-35, 35-50, 50-65, 65+) 


andor income classes) (0-7k,; 7k-15k5"15k+) . 


(a) Age Effect 


It was felt that money property income transitions would 
almost certainly depend on age, although just what these effects 
would be was not entirely obvious. Differences between the first 
two age classes were felt to be ambiguous. On the one hand, 
people in their early earning years might hold their savings in 
the form of financial assets such as stocks and bonds, and then 
liquidate these during the "middle" earning period to acquire real 
assets such as houses, etc. This would imply decreased probability 
of raising one's money property income in the second age period. 

On the other hand, since income is correlated with age, it was felt 


that many people would not even begin to invest in financial assets 
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until they had reached the second age bracket. Having purchased 
most Of they "necessities" of Life during their early earning 
years, and having experienced an increasing income during these 
years, they would not be in a position to invest in stocks, bonds, 
etc. This would imply increased probabilities of raising one's 
money property income in the 35-49 age period, especially for 


those initially in the zero or very low property income classes. 


The data reflected this ambiguity. If we examine 
Table 6.1, which illustrates the way that the interest income 
probabilities behave, we can see that for the low initial interest 
classes (less than $750), the probability of moving into a higher 
interest income class generally increases as people move into the 
second age bracket. For people in higher initial interest classes, 
on the other hand, the probabilities decrease as people reach the 
higher age class. Much the same behavior is exemplified in the 
dividend transition matrices as well. As explained above, these 
results are as one might expect. Whether the probabilities 
increase or decrease as one moves into the second age bracket 
is likely to depend on how much property income the person had 


to begin with. 


In comparing the third age class with the second, it 
was thought that the probabilities should be higher in the former 
for ait income classes That is, people in the 50-64eage class 
Should be able to increase their property incomes more frequently 
than people in the 35-49 age class. The data indicated, quite 
generally, that this was indeed the case. (Table 6.1 illustrates 


this behaviour for interest incomes. The dividend matrices are 


again similar.) 
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Table: 6.1 
Probability of interest income increasing or 
remaining constant for persons in different 


age classes and different initial interest 
income classes 


14-34 35=49 50-64 6.5%: 


1.000 
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Interest 
Class 


ik on =n oA 


Probability of interest income increasing or 
remaining constant for persons in different 
total income classes and different initial 

interest income classes 
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Comparisons between the 50-64 group and the 65 and over 
group were expected to involve ambiguities. On the one hand, 
retired people might run down their liquid assets to make ip Lor 
their lost employment income. Or, they might convert real assets 
euch as houses into assets yielding a monetary return. _ In the 
former Case, property income would go down; in the latter case, 
1t would go up. The data indicates that the latter tendency 
pegs erongest.» The probability of remaining, in the same posue1on 
or of improving one's property income increases as people move 


into the 65 and over age group. (See Table 6.1) 
(b) Total Income Effect 


Anticipated effects in the case of income stratifications 
ace tess clear. It was felt that income would be significant 
in explaining the level of a person's property income. But 
it was not at all obvious that income was relevant in explaining 
changes in property income. The data tended to reflect these pre- 
Sumptions. The first two rows of the transition matrices varied 
quite strongly with income. As income increased, the probability 
Siemoving trom the 0$ class or the 1=250% class to a higher pro- 
perty income class increased quite significantly. But for people 
Pa picher anitial property income classes,, the puobahbilityeoe 
doping better did not change much at all as income increased. 
(See the probabilities for interest income in Table 6.2. the 


Dividend matrices reflect the same behaviour.) 


This was a fortuitous but fortunate result. When the 
transition matrices were stratified on both age and income, there 


were very few observations in the last 1l rows. This meant that 
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a large sampling error would be introduced into the probabilities 
in these rows if both stratifications were kept. Because the 
observations in these rows did not depend significantly on income 
(as was inferred by examining matrices aggregated over age) it 

was possible to aggregate these rows over income. The first two 
rows, however, in which income was important, did contain enough 
observations to make the complete age-income stratification 
Pessiblie. The net result, then, for both interest and dividends, 
mac tWOP Sets Of matrices. The first, containing 2 rows and 13 
columns, were stratified on age and income. The second, containing 


13 columns and the remaining 11 rows, were stratified on age alone. 
6.2.3 Retirement Income Parameters 


The retirement income process in POLSIM consists simply 
of the assignment of an initial pension to people when they first 
retire, provided they are eligible for a private pension. A model 
of how this initial retirement income will change over time has 
not been considered in this version of POLSIM. It would involve 
consideration of such factors as the type of pension plan a person 
Nad, iwhether or not he is. a holder,of an annuity, the kind of 
annuity; ‘changes in his plan that are consequent son the «death oF 
his spouse, and so on. Such a model would hopefully be incorporated 
intorma Lacer version of the Market Income Block. For the presen). 
retirement incomes, once established, are assumed to be constant 


through time. 


The initial pension distribution was taken from the 
Statistics Canada Publication, "Survey of Pension Plan Coverage 


BIG5S." 7 Catalogue number 74-506, Table 35. The table gives the 
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distribution of annual pensions as a percentage of final annual 
earnings for pension plan members who retired during the year 
erding December 31, 1965. The distribution is by sex and annual 


earnings group. 
6.3 Validation of the Market Income Block 
Gio. Introduction 
The parameters of the Market Income Block were validated 
by constructing expected income distributions on the basis of 


the transition matrices that are input to the block. The approach 


was to first compile sets of income distributions from the 1967 


Survey of Consumer Finance. Each distribution corresponded to a 
Dastacular transition matrix. (For example, annual wages of 
Class A persons, ages 14-35). These distributions were then 


multiplied by the fourth power of the relevant transition matrix, 
to produce the expected 1971 distributions. The actual 1971 
distributions were then constructed from the 1971 Survey of 
Consumer Finance, and these were compared with the expected 1971 
distributions. The results of these comparisons are presented in 


the graphs on the following pages. 


The above procedure is approximately equivalent to doing 
Eetour year simulation. It is not, however, identical, sancegan 
actual simulation would account for new entrants into the labour 
force, demographic changes in the population, retirements, and so 
on. The expected value approach, on the other hand, has the 


advantages of cheapness and the absence of Monte Carlo error. 
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For this reason the method of expected values, as outlined 
above, was used as preliminary validation. When the results 
of the 1973 Survey of Consumer Finance become available, it 
will be possible to do a simulation validation from the 1971 
initial year tape. The two validations, together, will 
permit the estimation ofthe extént of Monte Carlo error. 
pamelly, a tull simulation using the” entire model (see Ch. W) 
will provide the ultimate test (since it accounts for new 
ERtrants, e€tc., as well as illustrating the extent of the 
Monte Carlo errors). The expected value approach gives a 
good test of the parameters per se, and indicates how well 


they can be expected to perform over a four year period. 


6.3.2 Annual Wage Transitions 


The first three graphs in figure 6.2 compare the 
actual 1971 annual income distributions with the corresponding 
expected distributions. Overall, the correspondence is 
quite good. The graph for the 14-35 age group shows a shift 
in the expected distributions towards higher incomes. Toa 
lesser extent this is also true for the 36-45 age group. 

The reason for these shifts is that the transition matrices 
for Class A persons were estimated from 1970-71 data. Real 
growth in wages and salaries for that year was 6.9%, whereas 
it averaged only 5.8% over the 1967-71 period. Consequently 
one would expect the transition matrices to yield a higher 
overall distribution for the 14-35 age group, which is the 
age class with the highest growth in income. To a lesser 
extent, one would expect the same phenomenon for the 36-45 
age group as well. Since this is exactly what happens, and 
since the two distributions are otherwise very similar, we 


have reasonable confidence in this set of matrices. 
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6.3.3 Weekly Wage Rate Transitions 
The graphs in figures 6.2d - 6.2k illustrate the weekly 

wage rate comparisons. Again the correspondence between expected 
and actual is quite good, although two disimilarities should be 
moted. The first is a result of the fact that the real growth 
rate implicit in the transition matrices is lower than the™oreyen 
kate that actually obtained over the 1967-71 period*, and the 
second is caused by the absence of new entrants in the calculation 


of the expected values. 


The weekly transition matrices were calculated from 
1969-70 data. In that year the real growth in wages and salaries 
was 4.8%. In the 1967-71 period the average growth rate was 5.8%. 
Consequently one would expect the simulated 1971 distributions to 
be shifted slightly to the left, and in general it can be observed 


that this is indeed the case. 


The only exception to this leftward shift arises in the 
Gase of persons in the 14-35 age group. The shift for this group 
is counterbalanced by the distortion arising from the absence of 
new entrants. Since the calculation of the expected 1971 distri- 
bution is based solely on transitions made by the 1967 working 
population, and hence takes no account whatever of new entrants 
to the labour force over the 67-71 period, the 1971 expected 
distribution is representative of a mature working population: 
one that has been working for at least 4 years. This population 
will obviously have a higher average income than the corresponding 


actual population which includes new entrants. The effect of this 


*This error can be corrected, in an actual simulation, by varying 
the exogenous rates of inflation. 
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will be to make the expected frequencies too low for low 
income classes (where the majority of new entrants fale, 


and consequently too high for high income classes. 


Taking these two sources of error into account, 
the comparison is quite good. The simulation model will 
eliminate the problem of new entrants, thus producing better 
results than are indicated by the comparison of the expected 
and actual distributions. The problem of an implied rate of 
real growth within the transition matrices remains. But as 
mentioned above, this can be corrected for by adjusting the 
exogenous rates of inflation. A future version of the model 
could possibly generalize the transition matrices to account 


for time variance. 


6.3.4 Property Income Transitions 


The last four graphs in figure 6.2 refer to property 
incomes. Since the 1967 Survey of Consumer Finance did not 
collect data on dividends, only the "interest and other invest- 
ment income" component of the property income transition matrices 
was tested. It is fair to assume that the dividend transitions 


would reflect much the same behaviour as the interest transitions. 


The four graphs indicate that although the expected 
distributions follow the actual distributions quite closely, there 
is a general tendency for the transition matrices to overestimate 


interest income. This is especially evident if we examine the zero 


income class. For all four age groups, the expected frequencies 


in the zero income class are less than the actual frequencies. 
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And aS a consequence, the expected frequencies for the positive 
income classes are all slightly higher than the respective 


actual frequencies. 


the reason for this is that the data from wien te 
transition matrices were estimated includes only persons who 
file tax returns or who make unemployment insurance contributions 
in the year for which the data applies (1969, 70, or 71). Those 
persons who would not fall into either of these two categories 
would be people who would be expected to have zero property 
incomes. Consequently the transition matrices will be biased in 
favour of inducing higher property incomes. This is a limitation 
that cannot be corrected by existing data. The Department of 
National Revenue iS currently initiating a study that would 
bring non-filers into their data base. Once this has been 
accomplished, a re-estimation of the transition matrices should 
eliminate the above bias. In any case, as the graphs indicate, 


tie Crror is not particularly serious. 
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7. Full Model Simulation: 196.7 Stan LO71 


ged -introduction 


The complete POLSIM model was tested by simulating 
ene 2oum year. period) from 1967.te.1971, The initial year 

| population for the simulation consisted of 397,960 individual 

records derived from the 1967 Survey of Consumer Finance. 

The input parameters to the simulation (unemployment rates, 

total number of immigrants, and rates of wage inflation) 


were the actual values that obtained over the four year period. 


The computer program was set up so that all of the 
separate blocks would follow one another automatically. 
The simulation then proceeded one year at a time. For any 
given year, the Immigration block was run first. The output 
file of new immigrants was then merged with the previous year 
final output file (or the initial year population in the 
case of the 1967-68 simulation). This new file was then 
input to the Demographic Block which in turn passed its 
output file to the Activity Block for processing of the 
Activity variables. The Market Income Block then updated 
the income components of the individual state vectors thus 
completing a one year simulation. This whole procedure was 
carried out four times, the final output being a synthetic 


population representative of the 1971 population of Canada. 


Once the simulated 1971 population had been produced, 
it was-possible to compare it to, another, estimate of ithe 
1971 population as measured by the 1971 Survey of Consumer 
Finance. This comparison was then a measure of the adequacy of 


the POLSIM model for simulations of four years or less. 
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The details of the simulation from 1967 to 1971 
are presented in section 7.2 below. The comparisons between 
the simulated 1971 population and the other estimate of the 


1971 population are then discussed in Section “{.s. 


je Simulation Results: 1967-1971 


the results of the simulation of new immiorante 
over the our year period are summarized im Table 77.1.) 1c 
can be seen that the model slightly underestimates the total 
number of immigrants, even though the actual number is an 
exogenous input. The reason for this is that children are 
created stochastically in the Immigration Block, and that 
slight adjustments have to be made to the number of married 
women so as to equate them with the number of married men. 


These adjustments have been discussed extensively in Chapter 3. 


The results of the Demographic Block processes are 
summarized in Tables 7.2 to 7.5. The aggregate totals 
produced by the Demographic Block are compared with actual 
figures derived from Vital Statistics. In general, the 
model performs very well. The tables indicate that the 
model tends to underestimate the number of live births, 
deaths and divorces while overestimating the number of 


marriages. 


The analysis of the errors inherent in the Demographic 
Block processes is presented in Chapter 4. This analysis is, 
however, related to particular groups with common probability 


oe cuccess, and is not directly applicable to the caseso7 
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Table 7 ik 


Comparison of simulated and actual data* 


Total number of immigrants: 1968-71 


Simulated 
160,250 
£5 75850 
144,200 


118,450 


imierakion statasties, 196/-197.1., 


Actual 
US3.974 
LOL oo. 
TAT 7i3 


127-900 
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aggregates. Any error in an aggregate figure, such as the 
total number of births or deaths, can be attributed to the 
following causes: (1) errors in ‘the initial populatvon ee 
Mictew Az) errors in the probability parameters; (3) simulation 
‘ @rrors; and (4) the additive or cancelling effect of the 


above three types of errors. 


In Chapter 4 we have analyzed the first three 
types of errors. However, the fourth type is much more 
complex, since it is related to the whole spectrum of the 
population groups at risk. For this reason we will not give 
a complete analysis of the errors of each process. Rather, 
we will point out informally the reasons for any deviation 


of our simulated aggregates from the actual ones. 


The results of the birth process are given in 
table 7.2, and it can be seen that the simulation begins 
with a fairly large error in 1968 which then declines until 
1971 when the results are almost perfect. Part of the error 
in the early years (approximately 15,000 live births) can be 
attributed to’ initial population errors. ~In particulay,] tie 
initial population is largely underestimated for women in 
the 20-24 age group, and since this is the prime child- 
bearing age, a large underestimate in births will naturally 
ensue. The underestimate in births declines as the simulation 
progresses for two reasons. First the model uses stationary 
fertility probabilities estimated for the latest year. The 
birth rate declined over the period 1967-1971, and since 
this was not reflected in the probabilities, one would 
expect any underestimate to decline as time goes forward. 


Second, the underestimate will also decline as time progresses 
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because the effect of the initial population error will 
decrease. In the initial population, females in the 15-19 

age group are only slightly underestimated (as compared with 

the large underestimate in the 20-24 age group). Therefore 

in each succeeding year, the 20-24 female age group will 

more closely approximate the true population in that age 

group, and hence the underestimate in births which results 

from an underestimate in the 20-24 population will progressively 
bescliminated.)/ In examining thé regionaltrestilts, table 7.2 
indicates that there is no evidence that fertility probabilities 


should be regionalized. 


The divorce process results are given in table 
7.4, and are much better than expected. It is known that 
Canada passed through a transient period during the time 
of the simulation, insofar as the incidence of divorces is 
concerned, due to a change in the divorce law in 1969. This 
made the simulation of divorce quite difficult, and hence 
the results are on the whole quite pleasing. It is obvious, 
however, that the regionalization of the divorce probabilities 
should be seriously considered. In 1970, for example, the 
model significantly overestimates the number of divorces in 
Quebec while underestimating the number in Ontario. It is 
clear that the number of divorces in these two processes 


cannot be considered to be outcomes of the same stochastic 


process. 


Table 7.3 indicates that marriages are slicutly 
overestimated. This can be explained by the fact that any 
individual of marital status "other" is, in the model, 
eligible for marriage. We recall that the state-variable 


marital-status can attain three state-codes, i.e. single, 
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Table 7.2 


Comparison of simulated and actual data * 


Live Births by Region: 1968-71 


Atlantic Quebec Ontario Prairies BYG; Canada 

1968 Simulated 265750 81,600 104,000 44,300 285390 285,000 
Actual 40,306 96,622 1263257 659770 S3708/ 364,310 

1969 Simulated 27 $050 672350 114,050 48,550 30,000 306,800 
Actual 40,322 G5 5610 L305398 66,256 3595993 369 ,647 

1970 Simulated 29 30:50 92,000 124,200 539200 33,500 332,000 
Actual 40,200 BLS! L3ARLZ4 66,658 36,861 SPLSISS 

1971 Simulated 32,800 94,850 129 7000 D5 5050 3623050 348,950 
Actual 41,307 89,210 1303395 64,630 34,602 DO2 5 HOT 


* Source: Vital Statistics - Statistics Canada Catalogue 84-201. 
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Table 7.3 


| Comparison of simulated and actual data * 
Marriages by Region: 1968-71 


7c Ebon ones Quebec Ontario Praintes Bier Canada 

1968 Simulated 18,200 54,050 Se HES) 853525 20,850 199,400 
| Actual 16,665 46,004 62,109 29,678 16,914 171,766 
1969 Simulated 18,500 556375 7 25 32,450 19,700 1375450 
Actual 175420 Bye Das 67,150 D16375 18,284 182,183 

1970 Simulated 19,800 55, 600 (B65 34,900 19,950 203,875 
Actual Lf, 6815 49,607 68,874 3,620 20,026 188,429 

1971 Simulated 19,950 Sore wes) 748560 35,000 218 25 207,250 
Actual 18,678 49,695 69,590 324004 20, 389 191,326 

‘Rabie w/mee 


Comparison of simulated and actual data * 


Divorces by Region: 1968-71 


Atlantic Quebec Ontario Prairies B, C.. Canada 

1968 Simulated Loi Seeks 1362) S522 L,@50 205,950 
Actual 675 606 D036 21 OD 25220 11,343 

1969 Simulated ee!) 6,075 yiasi010) $}5,5)50) PES25 20,800 
Actual 13362 2930 Li, eas 5,648 4,224 26,079 

1970 Simulated 1,400 6,450 S000 papa! 140) 2,600 vay fs 16) 
Actual 1,414 4,865 12,451 55 G10 Se ial 29.7095 

1971 Simulated 2,000 Pee) 8,450 2, 800 2,400 21,625 
Actual a We 55195 125489 5,835 4,942 29,626 


* Source: Vital Statistics - Statistics Canada Catalogue 84-201. 
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married, and other. The "other" includes the widowed, 
separated and divorced. Including the separated into the 


eligible population for marriage clearly introduces positive 


biases. 


ft can be seen from table 7.5 that ‘the dean 
process results in an underestimate in the number of simulated 
Oeathisc. This underestimate is 9.4%, 13.6%, 11.2%) and 2.22 
respectively in the years 1968 through 1971. Most of this 
error (approximately 8%) is a consequence of the initial 
population underestimate, while simulation error can account 


fOr another +4%. 


The effects of all of these various population 
flows are presented in Tables 7.6 and 7.7. Table 7.6 shows 
the magnitudes by which total population is changed by the 
flow processes of birth, death, immigration, and emigration. 
It also demonstrates the "Law of Conservation of Population". 
If the procedure by which new records are created through 
births and immigration are working properly, and if the 
procedure by which individual records are deleted through 
death and emigration are also working properly, then the 
final output population in ary given year should equal the 
initial population plus the sum of births and immigrants 
less the sum of deaths and emigrants. The data in Table 7.6 
indicates that the model does "conserve" population in this 


sense. 
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Leben a> 
Comparison of simulated and actual data* 


Deaths by Region: 1968-71 


Atlantic Quebec Ontario Prairies Bac. Canada 

1968 Simulated 14,750 33 5200 50,800 24s 350 15,800 138,900 
Actual 15026 BOR a 7: Bere 255538 16,828 13535 29.5 

1969 Simulated £35650 31,900 47,650 295400 14,600 0335 400 
Actual Nise as 40.103 Don OT. Zone L723 0g AD4aS477 

1970 Simulated 135650 35300 51,000 255 900 15, 200 1573050 
Actual i Wakage hh 40,392 56,769 25,440 173020 155.961 

1971 Simulated 14,950 32,800 51,450 26 soso 18,650 144,400 
Actual 153.8 40,738 56,623 25,905 Lis 73 Loi sere 


* Source: Vital Statistics - Statistics Canada Catalogue 84-201. 
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Table 7.6 


Simulated population flows, 


1967 1968 1969 1970 
nitial Population 19,898,000 203 £52)7450 20,,4073056 20,669 ,600 
a 
Births 285,000 306,800 332,000 348,950 
+ 
Immigrants 180,250 157,850 144,200 113.450 
Deaths 138,900 1335450 137,050 144,400 
Emmigrants 71,900 76,600 76,600 75,850 
redicted Simulated 
opulation at end 20 5152,450 20,407,050 20,669,600 2059.56, 750 
f simulation year 
ctual Simulated 
opulation at end 2001525450 20,407,050 20,669,600 20,916), 756 


f simulation year 
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Table 7.7 compares the sex-region populations 
produced by POLSIM with the same distributions as reported 
by Vital Statistics. It can be seen that there is a general 
underestimation on the part of the model. This can be 
explained by the fact that the Survey of Consumer Finance 
underestimates the total population (by excluding the Yukon 
and N.W.T., military personnel, and persons in institutions) 
and because the model itself underestimated the number of 
births in each of the simulated years. Over the four year 
period the underestimation in population that ‘can be attributed 
forthe model itself is 128,276 (sum of underestimates in 
births less sum of underestimates in deaths). The total 
underestimate in the 1971 simulated population is seen from 
Paotes)./ tO woe 780,260. “Of this total, 16% can be crudely 
attributed to the error generated by the model while 84% can 
be attributed to the error in the initial year population. 
fits is in fact an upper estimate of the model error. The 
fact that the initial population is too small to begin with 
implies that one would expect an underestimate in the number 
of births and deaths. Therefore part of the underestimate 
BPerscing from the model is ‘in fact attributable to the error 
im initial year population. (A more sophisticated analysis 
of the error in the model itself can be carried out along 


the lines discussed in Chapter 4.) 


The results of the labor force simulations are 
summarized briefly in table 7.8 and Figure 7.1. Table 7.8 
presents the unemployment rates, by sex, that the model 


produces over the entire four year period and compares these 
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Comparison of simulated and actual data 
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Table 7.7 


* 


Population Distribution by Sex and Region: 1968-71 


Male Simulated 
Male Actual 


Female Simulated 
Female Actual 
Male Simulated 
Male Actual 
Female Simulated 
Female Actual 
Male Simulated 
Male Actual 
Female Simulated 
Female Actual 
Male Simulated 


Male Actual 


Female Simulated 
Female Actual 


* Source: 


Atlantic 


986,800 
1, 009; 300 


See 18) 
291557700 
984,950 
1,013,800 
9735100 
998,200 
IG25 350 
POLS, 400 
Diiggno0 
1,002,600 
980,200 
13,038,215 


O73, 100 
1,019 ,040 


Quebec 


2.920.200 
2,956,600 


2,952,600 
2,970,400 
2,933,000 
2,982,400 
2,962,700 
3,001,600 
2,945,450 
2,993,000 
2,974,800 
3,020,000 
2,955,150 
2,994,550 


2e9G a 5h 0 
0835205 


Ontario 


3,495,800 
3,649 ,800 


3,541, 850 
250 20,200 
35 508,300 
35 /21,500 
3,019,850 
eWay ps) FPG) 8) 
3,03 15000 
3,812,000 
35696.) 200 
35020, 000 
Sedu. 
3,840,905 


3,767,850 
358625200 


Prairies 


1,640,200 
1,953,000 


T3615.600 
1,704,000 
1,647,400 
Let sap oOw 
16205250 
LR s00 
1, 6505, 250 
13733, 100 
1,628,500 
1,739,900 
Loo L450 
B Ri giles WR at ES) 


1,641,500 
1,749,240 


B.C. 


O91 7 50 
1,016,400 
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with the actual rates for the same period as measured by the 
Jabor force survey. Figure 7.1 plots the monthly aggregate 
simulated unemployment rate and the monthly aggregate actual 
unemployment rate over the four year period. It can be seen 
that the model tracks the unemployment rate very well, and 
there is no tendency for it to get "off-track" as time 
progresses. There is a slight tendency, however, for the 
Samulated rate to be too high. “This was expected, because 
the simulation parameters had been adjusted to fit the 
higher unemployment rates of the period April 1972 to April 
19733 (See Chapter 5). This adjustment was such as to 
increase the resulting simulated unemployment rate slightly 
from that which would have resulted from the original equations. 
Since the original regression equations of the labor-force 
model Had been fitted to data from the 1959-1969 period, the 
adjustment would be expected to simulate too many unemployed 
persons over the four years 1967-71. The adjusted equations 
would be expected to perform better over a four year period 


peginnang in LoL. 


The Market Income simulation is summarized in 
Tables 7.9-7.11. For each of the component incomes (employment 
income, property income, and retirement income) distributions 
for the four simulated years are presented. As standards of 
comparison, the same distributions from the 1967 and LO del 
SCF surveys are also given. It can be seen that the simulations 
perform as one would expect; there is a general tendency for 


the distributions to shift to the right as time progresses. 
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Table 7.9 


Employment Incomes 
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Table 7.10 


Property Incomes 


Income Base '67 Final '68 Final '69 Final '70 Final ‘7% Base ‘71 

Categories 
in $ 
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Table 7.11 


Retirement Incomes 


Income Base '67 Final '68 


Final '69 instal 2 7X0 Final '71 Bases. 

Categories 
in $ 
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Since no adequate income data exists for the years between 
1967 and 1971, it is not possible to examine how well the 
Simulations perform year by year. All that can be compared 
are the final results (the distributions for the year 1971). 
It should be noted that the comparisons for property income 
in Table 7.10 are not really meaningful, due to the fact 
that the 1967 survey did not distinguish between dividend 
and interest income. As a result, the model simulated total 
property income with interest income. transition matrices, 
and hence the resulting final simulated property income is 
not strictly comparable with the results obtained from the 


1971 SCF survey. 


A more detailed comparison between the final 1971 
simulated results and the actual data for 1971 are given in 


the next section. 


7.3 Analysis of the Simulated 1971 Population 


Jinars comparisons between the 1971 simulated distribu- 
tions and the 1971 distributions as measured by the SCF 
survey are presented in Tables 7.12-7.24. These tables show 
how the populations in the two samples are distributed over 
demographic characteristics (age, sex, province, etc.) ,; 
activity variables (weeks employed etc.), and market income 


characteristics. 
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Comparison of simulated and base year populations by region for 1971 
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Table 7.14 


Comparison of simulated and base year populations by age groups 


for,197) 
Age Simulated Rase Vear 
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Table 7.15 


Comparison of simulated and base year populations by sex 


hor 19771 
Sex Simulated Base Vear 
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Comparison of simulated and base year populations by family status 
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Comparison of simulated and base year populations by marital status 


for L971 
Marital Status Simulated Base Vear 
Single 107097 7450 10,435 058 
48.27% 49.01% 
Married 9,544,800 9,462,600 
45.63% 44.44% 
Other L. 2IA 500 1 22S poo0 
6.09% 6.55% 
TOTAL 207216, 750 214293200 
100.00% 100.003 
fable 7.18. 


Mparison of simulated and base year populations by number of weeks in school 


1e@ne US) 7/1 
Weeks in 
School Simulated Basemmeats 
0 £44680;859 13,639,800 
70°..19% 64.06% 
1-12 Sa OO. L950 
ise 0.06% 
13-28 620,400 344-500 
2a oe 5 cae 8 
29-44 5298 7 500 Dp SO aU 
254.93 34.903 
454 0 0 
0% 0% 
| OTA 20) OL Gee) 215293200 


LO Ou oe 100.00% 
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Table 7.19 


comparison of simulated and base year populations by number of weeks employed 


Lor 19°71: 
Number of 
Weeks Employed Simulated Base Year 
0 LL 439 pos 12,004,100 
54.69% 56.38% 
1-12 5907200 853,100 
2.82% 4.01% 
13-24 £72027 500 655,100 
5.75% 34.08% 
25-36 Ls, 700 7634900 
5.66% BOs 
37-48 57 810,550 TLZ pO5.0 
8.66% 3.34% 
49-52 4 O91 250 6,304,650 
De 29.613 
TOTAL 20,9 LOm 7 OU 21 p23 ZeO 


100.00% 100.00% 
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Table 7.20 


mparison of simulated and base year populations by number of weeks unemployed 


ig ma i al 
Number of 
Weeks Unemployed Simulated Base Vear 
0 18,713,550 19, 6357250 
89.76% 92.22% 
ater L176, 450 709,450 
5.62% 3.34% 
13-24 663,100 Bil Ooo 
elie De ee 
20-06 25), 0 293 7 OU 
1, 03% Lge Oe 
37-48 73-7200 185,400 
0.342% 0.873 
49-52 U5. 250 Ol, 700 
0.073% 0.43% 
TOTAL 20,9167 450 21,293 208 
100.003 100.003 
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Comparison of simulated and base year populations by number of weeks 


in non-labour force for 1971 


Number of Weeks 


in Non-Labour Force Simulated Base Year 
0 645749850 1H 4d spa 
3h e482 35.202 
1-12 Boao 0 L poo peo 
255s 36.5 es 
13-24 256,600 504,500 
6.013 PN a 
25-36 VAS, 950 426,400 
Ee AUN 2 OOS 
37-48 986,700 35395 50 
WOW 1.663% 
49-52 FOO 250 4. 22 900 
28.59% 22) oes 
TOTAL 20,9 LC pl 0 DL 2 oe 


100.00% 


100.003 
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Comparison of simulated and base year populations by employment income 


categories for 1971 


ae 


Employment 
Income Simulated Base Year 
No Income Tt 474,050 12°,054., 750 
54 7643 56.61% 
$1-499 437,800 896,550 
2.093% 4s 
$500-999 6247550 608,200 
2.983% 286% 
$1K-1499 607 600 458,650 
PAs xO Zoe 
$1500-2K 545 3.00 389,200 
Zo Glee L836 
$2K-2499 G1 7000 368,300 
2925 Les 
$2500-3K 61.6570 50) S320 50 
22955 Loos 
$3K-3999 1095 755.0 788.650 
Faas os 
$4K-4999 933,000 798,450 
4.46% Beis 
$5K-5999 834,850 191, 600 
a2 99s Seas 
$6K-6999 629,200 Aster ten) 
32012 Seeeles 
$7K-7999 492,700 689,950 
2.363% 31.242 
$8K-9999 Tigo U LOGS 200 
ey ie 4.99% 
$10K-12K 448,900 590,300 
2205% Zend 
$12K-15K 508,950 403,700 
2,434 150s 
$15K-25K 286,400 256,200 
Liar Le 20S 
$25,000+ 59,600 PRS Boll 
0.28% 0.36% 
TOTAL 20,916,750 21,293,200 
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Comparison of simulated and base year .opulations by interest 


income categories for 197 


Interest Income 


No Income 


Simulated 


14,806,200 


Base Year 


L7 7237200 


10S 792 83 724¢ 
$1-249 A753 30 5250 2/7247;5450 
IO 6 TAS IO) pes 
$250-499 589,850 2165550 
2 aoe 1.96% 
$500-749 306,100 222A TOC 
1.463 1.063 
$749-999 196,350 T5d6B 50 
0.943 (Ror ak” 
$1K-2K 353,800 2837,2006 
1.693% 1.322 
S2K-3K 1271050 114,950 
0.583 0.543 
S$ 3K-4K 83,900 Soro OU 
0.403 0.235 
S4K-5K 40,400 36,000 
O92 Obs 
S5K-8K 51,900 38,650 
On25% O2LSs 
S8K+ 34,950 29,500 
(etal 0.14% 
TOTAL 20 9164 750 21,2937,200 


LOO. OOF 


100.00% 
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Comparison of simulated and base year populations by retirement income 


Categorias tor 1977 


Retirement 
Income Simulated Base Vear 
No Income 20,474,100 20,744,400 

97.89% 97.43% 
$1-249 56,550 63,500 

Omens 0.3% 
$250-499 Sy PA ROLe 68,050 

OL27 0.325 
$500-749 465350 557600 

Oe223 0.262 
$750-999 47,400 49,300 

OF 2a. 0.239 
$1K-1499 64,650 T2250 

0 ss 02342 
$1500-2K 47.850 61,400 

G2 235 OL293 
$2K-3K 59,800 Whose ies) 

0.29% Or36e 
$3K-4K 207050 45250 

On kss Gee a Bes 
$4K-5K boy 550 25D 

0.073 eds 
$5K-8K 14,350 2 Ang DOO 

0.072% Ois dis 
$8K+ 4300 S250 

0.022 0.043 
TOTAL Diy eD L670 215.293.7200 


100.00% 100.00% 
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In assessing these distributions it is not possible 
to assert that they are "good" or "bad" in any objective 
sense. What we have to do is come up with some notion of 
whether the simulated population, when viewed comprehensively 
(over all pertinent’ variables) ;c:is adequate for the purposes 
toewhich/ it, is tobe’ put.«nThis  is'clearly notiacquestion 
which permits an unequivocal answer. Whether or not the 
simulation is "adequate" will depend on a number of factors. 
It will depend on the particular purpose for which the 
simulation is being used, on alternative sources of data, on 
the level of dissagregation at which we wish to view the 
results, and ®sovons” -Andnin> theefinal analysis;<it willtalse 
depend on the user's subjective idea of just what "adequate" 
means for his particular objective. Rather than attempting 
to answer the question of adequacy once and for all, the 
present statement will simply report the results that were 
obtained. It will be left to the individual user to determine 


how "good" these results are. 


Tables 7.12-7.17 summarize the results of the 
Demographic Block simulation. By the standard of comparative 
final distributions, the demographic block is the most 
Satistaectory part of’ the POLSIM model. \It can be seen that 
over all of the demographic variables the 1971 actual and 


1971 simulated distributions are very close indeed. 


This is to be expected, of course, since demographic 
characteristics are either easy to simulate (age and sex for 
example) or else affect small proportions of the population 


(death, emigration, etc.). They would therefore be expected 
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to compare closely tc the base year data. There is a discrepancy 
in the total populations, however, which should be noted. 

The simulated final population is 20,916,750 while the 1971 
survey estimated population is 21,292,640. The 1971 survey 
estimated population is therefore 1.79% higher than was 

obtained through the simulation. The reason for this discrepancy 
is complex, since it depends on the simulation of births, 

deaths, immigrants, and emigrants, as well as on the system 

of weighting that is used by the SCF survey. The errors in 

the simulation proper have already been discussed. The 
immigration totals accounted for a cumulative underestimate 

of 14,368 persons. It is difficult to say anything valid at 

all about the emigration estimates, because there exists no 
accurate data on the true extent of emigration. For the 

present we will ignore any underestimate that might occur on 

this account. Births and deaths result in a cumulative 
underestimate of 127,256 persons. The total underestimate 
resulting from the simulation is therefore approximacely 

141,624 persons, which means that it is still necessary to 


account for an underestimate of approximately 234,266 persons. 


This can be explained by the way in which the SCF 
surveys are weighted. The weights for the 1967 SCF apply to 
the population as of December 1967, while the weights for 
the 1971 SCF apply to the population as of June 1972. This 
is a difference of four and one half years. The simulation, 
on the other hand, is for four years only, and hence one 
would expect the 1971 SCF population to slightly exceed the 


population generated by the simulation. 
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The Activity characteristics, which are summarized 
in Tables 7.18-7.21, would not be expected to fare so well 
as the demographic variables. This is because the activity 
variables are simulated indirectly (through a month to month 
simulation), and because the whole population is subject to 
extensive changes. The Tables indicate that on the whole 
the simulated activity characteristics, though not as close 
eseare whe demographic characteristics, are still Pate Ly 


close to the base year distributions. 


The most important of the activity variables is 
weeks of unemployment, and this variable is given in Table 
7.20. It can be seen that the simulated distribution is 
more "flat" than the base year distribution; the simulation 
tends to simulate unemployment for a relatively larger group 
of people, but at the same to make the duration of this 
unemployment shorter. Thus the simulation "unemploys" 

10.24% of the population for at least some period, compared 
with 7.78% in the base year. In the simulation, however, 
54.8% of this group is unemployed for less than 12 weeks, 
compared with 42.9% for the base year group. Interestingly, 
the total number of weeks of unemployment over all individuals 
is only slightly less for the simulated group (28.1 million 
weeks versus 29.1 million for the base year). This "flattening" 
of the distribution is a consequence of the Markov-one 

nature of the simulation. In a Markov-one process, a person's 
present state is assumed to depend only on his state in the 
immediately preceeding period. In this instance, this is 
clearly not an adequate assumption. Whether or not a person 
is to become unemployed depends on his employment history 

for several preceeding periods. This is a refinement that 


may be dealt with in future versions of the model. 
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The results of the Market Income simulation are 
presented in Tables 7.22-7.24. The tables do not include 
the results of a simulation of dividend income due to the 
fact that dividends were not distinguished from other property 
income in the 1967 survey. As mentioned earlier, the simulation 
moved all of this property income forward as if it were 
interest income. The interest income comparisons can therefore 
not be expected to be very close. They do indicate, however, 
that the interest income transition matrices will not produce 


incomes that are wildly off-track. 


The results of the employment income simulation 
may be examined in figure 7.2 and table 7.22. It can be 
seen that the simulated distribution is too low at the low 
end of the income scale, becomes too high for the lower- 
middle income groups, becomes too low again for the upper- 
middle income classes, and then finally is too high for the 
higher income groups. The tendency for the simulated distribu- 
tion to exhibit a lower variance than the base year distribution, 
at least among the low and middle income groups, may be 
explained in part by the simulation of unemployment. As 
explained above, the Markov-one nature of the unemployment 
simulation tends to flatten the distribution of weeks of 
unemployment. That is, too many people have small amounts 
of simulated unemployment while too few have large amounts 
of simulated unemployment. This bias in the unemployment 
simulation will affect Class B persons (approximately 80% of 
the labor force - those who are in occupations which are 
likely to result in at least’ some unemployment) but not 
Class A persons (the remaining persons in the labor forces, 


the 20% whose occupations are such that they will never likely 


Om ie 


» ’ ‘ 
A » @ 
fr 
ee 


en Genny i i ; . 
iy | aise cP on 


nantes ap 4p — 4 
| eee asuottiow val ise : geen 


aS nom 198 Riis oa as tifa werters dtr ht 


OLEGLNeS 1 e- es a0Bi eat se Aled ont ta" : 
eeu 3t 255.0 aiGne eee Bet ae 
ob lity 94 HOT AG net bly ahi 


ots 37, pa? DMI EOL, OFF aa . ca wan and aint, 
Wb. a Go? BE j§24022 gultgt 4 a Prensa a) Shit. 
ae D fey el im ia ep 4 epee ae, ert eget rapheas sahgld. 
Holdin. 2 le aoayrdadbeh, lids uses joaniae sake ® Ji¢ides og apm 
| Wil aqu@eD sc4A4ek oi haae bra’ ‘wat aa? gnome, dedet ra. 
Jewaye) ean ae ah oe. Vina age we) (Siee af tontatgqua 
anaisGeey Wis 35 atigen bite 6 fs tude otek ata 
b4dird: : oot tiacnte llega “nand : eked culdte fonds 


i 311 ee Lote? resi bilayer eee Ab efi SU Seino gies 


Jeuwin  @ogkh even or ried ober, susie a ate limtae Yo | : 
eqns ofS a2 ine, “atl winners Setcimate 46° | 

ach, ai ?fed axlnosplrd athens i seal? ralt te, Din natsalmee 
a rely SeglsGyvaue eh wpe dy iene - woret sodel” eat - 

ten oo! (Arava Dire Gree Ss hal de vw Mares oo qtedis 
\So"8 has ental wrap prihAioeky ast’ eace sg A sasld: 4 
iwRis | te (oe! dol? (ete cali aieSSouadiy POi, aise 
i> a ora 


~ 246 = 


become unemployed). The flattening of the unemployment 
distribution among Class B persons will tend to lower the 
variance in employment incomes. There will be too few 
Simulated persons with very low incomes (which are a result 

of high unemployment), and too many with low to middle 

incomes (a result of too many relatively low wage persons - 

the Class B group - with at least some unemployment experience). 
This effect will be mitigated as income increases because 

Class A persons will increasingly tend to dominate the 


Gdtstribution as incomes increase. 


It will be noted that the employment income simula- 
tion is not as good as would be expected from a cursory 
examination of the results of Chapter 6. Figures 6.2b and 
6.2£ present the results of a validation of the market 
income block parameters for prime age males, the group which 
forms the largest proportion of persons in the labor force. 
These graphs demonstrate that the annual wage transition 
matrices and weekly wage transition matrices are such as to 
reproduce annual and weekly wage distributions over a four 
year period almost exactly. These expected 1971 distributions 
do not include any simulation error, since they are produced 
by multiplying the 1967 distributions by the fourth power of 
the Prevent Matrix. But one would expect a very small 
simulation error in any case, due to the large number of 
persons in the simulation. So an examination of Chapter 6 
alone would lead one to expect an excellent simulation of 
employment income. But for the reasons stated above, the 
actual simulation does not yield the almost perfect results 


that might otherwise be expected. 
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A better simulation of employment income probably 
could be produced, provided one wanted to simulate income 
independently of the underlying labor force activity that 
_ produces employment income. It is the attempt to explicitly 
model labour force activity which creates difficulties in 
the present model. We have indicated above ways in which 
these difficulties may be overcome in future versions of the 


model. 


The retirement income simulation, as can be seen 
from Table 7.24, is very good. It tends to follow the base 


year distrib: tion almost exactly. 
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Shs) | MMlove! Policy Block 


In one sense all of the other blocks of POLSIM are 
a prelude to the Policy Block. The ultimate objective of 
advancing the model population through time is not achieved 
until the Policy Block, with its models of government programs, 
“has been run. In this Chapter we shall first consider the 
problem of evaluating government programs from a purely 
technical point of view in the context of POLSIM. We shall 
then briefly consider the program models or policy algorithms 
which have been developed as part of the POLSIM project, 
leaving a fuller discussion of these models to a later 
report... Next we shall indicate how the Policy Block is rune 
Finally, we give an example of the application of a policy 


algorithm. 


Saul Vvaluating, the Effects of Government Programs 


In general there are two classes of effects caused 
by government programs. The first we call real effects 
(e.g. changes in relative prices, changes in work effort, 
etc.) and the second we refer to as financial effects (e.g. 
changes in disposable money income). In modelling the 
operation of government programs, our microcomponents are 
cast in a particular macroeconomic environment. This macro- 
economic environment is defined in the model by the exogenous 
specification of such things as price indices and unemployment 
rates, which then translate into real effects (e.g. unemployment) 
for particular individuals. These real effects are determined 
before the Policy Block begins. That is, the real effects 
do not depend in any formal way on the government programs 


contained in the Policy Block. Neither do the policy algorithms 
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of the Policy Block produce real effects directly on the 
microcomponents. Individual behavior is unaffected by the 
particular policies modelled. Financial effects, then, are 
calculated under the assumption that there is no feedback 


from the policy algorithms to real effects. 


This kind of treatment is rigid if not unrealistic, 
and it does limit the usefulness of the model. However, it 
does not mean that no recognition whatever can be made of 
probable behavioral effects caused by individual government 
programs. For instance, in the case of tax-transfer work 
disincentive effects it is possible to build behavioral 
response into the model. Nevertheless, the existing structure 
of the model does mean that a given time track will remain 
undisturbed by alterations of policy algorithms in the 
Policy Block. That is, possible current behavioral effects 
of individual government programs, while they can be made 
to affect today's outcomes, do not affect tomorrow's possibilities 
or events. This we regard as the most serious limitation of 
the POLSIM model and it arises mainly because of the absence 


Ofvexplicit treatment of capital stocks in the model. 


8.2 Policy Algorithms 


A number’ 6f policy algorithms have been constructed 
or are under development as part of the POLSIM project. For 
the moment we shall do no more than list the names of the 
government programs modelled at this times A later repore 
will document software and contain tests performed to establish 
the accuracy of simulation results achieved using these 


algorithms. Algorithms exist for the following government 


programs: 
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(a) Personal Income Taxes (federal and provincial) 


(b) Old Age Security 


(c) Guaranteed Income Supplement 


(d) Canada (Quebec) Pension Plan Contributions 


(e) Unemployment Insurance Premia 


(£) Hypothetical Negative Income Tax Programs 


teoeenunning they Policy Block 


The Policy Block, comprised as it is of a series 
of individual algorithms, does not possess a structure which 
is in any way similar to the other blocks. The Policy Block 
proper is embodied in a computer program, RESULT, whose 
function is to call subroutines expressing the policy algorithms 
and to accumulate and print out in convenient tabular form 
the effects of these policies. Program RESULT (see Appendix 
F) can be readily altered to accommodate a wide variety of 


Teporting £Lormats. 


The Policy Block takes as anput the file desemibing 
the model population for some given year and proceeds to 
produce the program effects for that same year. Since the 
Policy Block does not affect any given time track, it may be 


run either for all years of a given projection or for selected 


years only. 
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The computer program used for producing the 
distribution tables cross-classifies the policy effects ina 
number of different ways. There are twenty-two different 
classifications (e.g., province, sex, marital status, total 

income, etc.). Each classification is divided into a varying 
number of categories (e.g. ten categories for the provinces, 

two for sex, seventeen for total income, etc.). Any combination 
of pairs of different classifications can be used to produce 

the desired cross-classification tables, up to a maximum Of 


22 cross-classifications. 


The program will read either an individual's 
record or a family's records, depending on which input is 
required for the policy being studied. The user specifies 
his choice by setting the value of an input flag. An output 
flag must also be set by the user. This flag determines 
whether distributions of individuals are required in the 
output tables or whether distributions of families are 
required. Family distributions are produced from the 
characteristics of the head of the family in all cases but 
income. The sum of the family's incomes is used for assign- 


ment to income categories of family distributions. 


After reading each record (or records in the case 
of families being read) the program determines which cells 
in the output tables are relevant for that particular record. 
Each characteristic (e.g. province, sex, income) of the 
individual or family head is assigned to the appropriate 
category for that characteristic. (For example, the province 
category for an individual from Newfoundland iS One, frenmehc. 
it is ten; the income category for no income is one, for a 


total income above $25,000 it is 17). In the case Of a 
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family's records being read and individual distributions 
produced, each member of the family is treated as an indi- 
vidual, and each characteristic of each family member is 


assigned to the appropriate category. 


Once it has been determined where in the pre-specified 
distributional categories the given individual (or family) 
will fall, the micro-effects of the program to be studied 
ane Calculated for this particular unit. This requires Galling 
the appropriate policy algorithm which computes the desired 
effects. A tax algorithm, for example, would calculate 
the total taxes paid by a given family. A UIC contributien 
algorithm would calculate the UIC contributions payable by a 


given individual. 


The effects thus determined are then added to the 
effects calculated for all other persons in the same category. 
For example, if the person is from Newfoundland and has a 
total income of $7,000, the taxes he pays would be added to 
the taxes paid by all other individuals from Newfoundland 


in the $7,000-$8,000 income bracket. 


The program proceeds in this fashion until all 
individuals or families have been read. It then prints out 
the cross-classification tables desired by the user. These 
tables present both the absolute effects and the percentage 
distributional effects. The program listing and an example 


of the program output is given in Appendix F. 
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8.4 Example of Policy Simulation: The Personal 
Income Tax Algorithm 


The Personal Income Tax algorithm is a computer 
program designed Rey canis the the effects of the personal 
income tax on the family records that are output of the POLSIM 
model. The input to the program is the POLSIM state vector 
of a single family, the CPP and UIC contributions of each 
individual family member (these quantities it should be noted 
are themselves outputs of policy algorithms), the inflation 
Factor necessary to adjust tax brackets and exemption levels, 
and the year for which the simulation is to apply. The output, 
for each family member, is the following: his income (taxation 
definition), his "tax status" (whether he is an unmarried family 
head, a dependent, a married man whose wife is deductible, a 
married woman whose husband is deductible, a married person 
who has a larger income than his spouse who files a separate 
return, a married person who has a smaller income than his 
spouse who files a separate return, or an independent child), 
his federal taxable income, his federal tax payable, his 
provincial tax payable, and if he is a resident of Quebec, 


his Quebec taxable income. 


The program in its present version incorporates all 
of the changes in the tax legislation up to and including 
the February 19, 1973 Budget (see Appendix F). It begins 
by calculating the basic tax parameters for the given year: 
the various exemption levels as determined by the inflation 
index, and the marginal tax rate to apply in the” firsc tax 
bracket for the year being simulated. The program then 


calculates each family member's taxation income. Employment 
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expenses, tuition, and CPP-UIC contributions are deducted if 
applicable, and dividend income is grossed up by 1/3. The 
person's income includes those items present on the individual 
state vector (employment income, interest, dividends, 
retirement income, and other money income) plus income items 
which are produced by other policy algorithms and are taxable. 
No attempt is made in the present version of the model to 


impute capital gains. 


Once the individual incomes are calculated, tax 
status is determined. This then enables all of the various 
applicable deductions to be calculated for each family member: 
the basic exemption for an adult, the standard charity-medical 
deduction (assumed to be $100.00 for all persons), the old-age 
deduction, the marriage equivalent deduction for a dependent 
in the absence of a spouse, the deductions for children 
(which depend on the child's income), and the spouse deductions. 
Taxable income is thus determined (income minus total deductions), 
and the program proceeds to the calculation of tax. This 
is done by determining which tax bracket the person is in, 
and then summing the tax payable at the beginning of that 
bracket with the marginal tax payable on the income within 
the given bracket. Provincial tax is then calculated, and 


the program returns to receive the record of another family. 


It is not possible to completely validate the 
present tax algorithm. To do so would require that we simulate 
1973 taxes and then compare our results with data compiled 


by DNR for the 1973 taxation year. And this in turn requires 
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that DNR data be available. Unfortunately, the most recent 
DNR data that exists is for the 1971 caxation anes and this 
is also the year of the most recent SCF survey. That is, 

it is the most recent year for which a base year population, 


as opposed to a simulated population, is available for policy 


simulations. 


These data limitations resulted in something of a 
dilemma, insofar as validating the tax simulation was concerned, 
because we did not possess a 1971 tax algorithm. Indeed, there 
is no reason why we would construct one, because the purpose 
of POLSIM is to simulate policies into the future, not into 
the past. As a compromise it was decided to use the 1972 
tax algorithm (an earlier version of the algorithm described 
above), to simulate 1971 taxes. This would not give us an 
adeal check on the validity of the “algorithm, but at would 
enable us to check whether the algorithm produces severe 
distortions. To the extent that the changes introduced 
by the 1972 tax reform did not grossly change either tne 
absolute rate of tax (on total assessed income), or the 
distribution of tax across income categories, the simulation 
could be expected to compare very well with the data collected 


by DNR. 


The results of the simulation are summarized in 
Table 8.1. Two simulations were carried out, one on the 1971 
SCF population, and the other on the 1971 population generated 
from the 1967 base year by the POLSIM model. Both of these 


simulations are then compared with the data derived from DNR 


records. 
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SIMULATION OF TOTAL TAX BY INCOME CLASS 


Lo 71: Simulation From Simulation From 
DNR ' LOI USSCe 1971 POLSIM 
Statistics Population* Population* 
DOLLARS DOLLARS DOLLARS 
(000's) (000's) (0000's) 
-1499 22 6 0 
0.03% 0.00% 0.00% 
B00-2499 53.055 ey reo wm 56,142 
0.70% 0.36% 0.67% 
00-3999 847,096 25.8:, 906 402,334 
4.17% igre OA 4.78% 
IK-4999 422,448 Soyo als) 431,404 
5.07% 4.01% 5 is 
m-5999 eR see Mae 494,530 53510 he 
6.38% 557s GaS5 a6 
1K-6999 61S 1S G16, 905 5325.) 46 
7.42% 6.95% 6.32% 
in-7999 Ve O22 732,240 521,807 
8.65% 8.25% Geos 
1K-9999 1 Oe pa 1, 5327 FOU L005 7255 
16.74% Lice 3 Lies 
}OK-12K We03y7 2495 134,475 9274128 
12.45% ps SAS LOS os 
}2K-15K SM iy Rees) 1,096,040 173909) 220 
ae Odes 12. Os 16.62% 
5K+ Dee ON Sao 2,50) 722. 27 0 hOs, Sune 
| Zits 28.94% 31.06% 
TTAL 8,330,886 8,870,995 6/493), 244 
100.00% 100.00% 100.003 
} 
1972 tax structure 
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It can be seen that the simulation performs very well. 
The simulation on the 1971 SCF population matches the actual 
data very closely, both in terms of the distribution and 
in the absolute amounts. The simulation does, however, tend to 
slightly underestimate tax at the low end of the income scale 
and to overestimate it at the upper end. Both of these 
tendencies are what one would expect. To begin with, one 
Of the objectives of the 1971 tax reform was ("to give tax 
relief to Canadians of lower incomes". /°Thus the appl@cation 
ef thevlo72 rules! to the: 197lvpopulatron’ wot ldtin®itseltabe 
Ssufircientpto shiftithe distribution’ to-the uppervend? of te 
income scale. In addition to this, there are other reasons 
why the simulation would tend to shift the distribution of 
tax. In actual practice it is possible for persons with very 
low assessed income to pay relatively large amounts of tax. 
This situation can arise because: (1) returns may be filed 
by non-residents of Canada in respect of income from Canada 
which is not subject to personal exemptions; (2) individuals 
who are resident in Canada for only part of a taxation year 
will have their exemptions pro-rated to the period in which 
they earned their income; and (3) some returns are taxable 
only in respect of lump sum pension payments which are excluded 
from total income. It is not possible at present to samutare 
any Of these effects. In addition to these effects on the 
low income classes, simulated taxes in the higher income 
classes will tend to be overestimated. This is because the 
simulation only takes account of the minimum exemptions that 
could arise. Other exemptions which are not accounted for 
could arise because business or farm losses of earlier years 
may offset the current year's income, or because of such 
factors as foreign tax credits, registered retirement plans, 


unusual medical expenses, allowable deductions from investment 


income, or gifts to the Crown. 
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The tax simulation on the 1971 simulated population 
can be seen to compare quite closely with the simulation on the 
base year population, at least in terms of total taxes 
generated. The distributions are different, however, which 
is a result of the differences in the distribution of income. 
The extent and reasons for these differences in the income 


distributions are documented in Chapter 7. 


The above problems notwithstanding, it is possible 
to conclude from the present validation that the existing tax 
algorithm performs more than adequately. A more conclusive 
judgement will become possible when the taxation statistics 


for the 1972 taxation year are released by DNR. 
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o3 Concluding Remarks 


This chapter is concerned to set down some of the 
most important things we have learned about micro-data 
modelling in general and to suggest improvements which may 


be made in the POLSIM model in particular. 


The first mentioned can be disposed of fairly 
quickly. These may be summarized in three statements. 
First, micro-data modelling of whole populations is quite 
expensive, particularly in the model development stage. Our 
problems were perhaps exacerbated because of the necessity 
of utilizing confidential data at Statistics Canada but the 
fact remains that one is manipulating large amounts of data 
and this can be costly both in terms of time and money. Cost 
can be lowered, of course, by utilizing more efficient 
computing systems and by smaller samples in certain instances. 
We have done some work on the former question but not on the 
latter. It may be true, for example, that most of the 
processes we were concerned to model can be done well enough 


with a sample half the size of the one utilized. 


Second, it is important to commence information 
interchange between model analysts and computer systems 
experts at a very early stage of model development. This 
also relates to the question of cost. There is no reason 
why the most efficient programming of the model cannot be 
the first programming. The chances of achieving this are 
obviously immeasureably greater if this question is raised 


at the time of the development of the model structure. 


eis 2e ance awit nea of Banebs 
eribrabm tacde ans pe py 

vie Coda adenine ‘serene as sii 
} ob tual state a oenit ae 


— 


(19817 99) Seshgekh oc ss tancketem 
psn o0s2 nods nt bouts ees Ge 
e¢isip al anetialeqSa oied® 35 --lbiieio eee 


igste a~Ase: JnaAgels to Det Gee blac Lata 
8 lgerohn BAT 66 népiings Geese ryt oqeey 
an 42 eba , sa isetVvate 364 £705 inkewmbiSeur 4 
5425.40 adetion “we! oeprsaleylenet Bae suiby4 
bere emit 1S “reed bo dipt elie ei eee 
1 Venow titisn, pl! \ehoe oe  ieeewod , ; 
riwrie> ph Guboie Saal eee yd Baw. emesayze 
nf nd S00 +04 nOlsenin) goed? Odd mb ee + mene ant 
4 3m - t algae “4? wm eon yen 42% <1 


ime -! anol i | Iefiran oF Baees pun ote oe Be 


eigatesu Gye: wily: > nanks Tene ead ae 


ran esa Da i soy) ar 22 bases | 
weal Oi ei ian ; * 7 foam weartted 
saat -7eRaactan ccm.) owt phmhe Sees 
adait¥ one to. eol4aeup sae Of 

*) wit to tihng deelalyte 


asiet ei elm wilds «ti 1m ie. dam ve md 
f)2tuis4 6" Lebo Be 49 spe ates Se a 


= 260 - 


Third, it is to be preferred if the applications 
of the microdata model can be well defined early on. This 
would mean that the absolute minimum of data necessary 
for the individual state vector could be quickly identified 
and costly changes avoided. The composition of the individual 
state vector adopted for the present version of POLSIM 
represents a notional compromise between policy issues of 
assumed importance, adequate detail for modelling in relation 
to these policy issues, and cost. This process of compromise 
will be easier in situations where the policy questions can 


be well specified in advance. 


We now turn to the question of particular improve- 
ments to POLSIM. One feature, not of POLSIM proper but of 
the simulation exercise, which should receive attention is 
the adjustment of the initial year sample to better align it 
with other population measures. Given enough independent data 
for this purpose, it should be possible to bring initial 


population errors close to zero. 


In the Demographic Block there is an obvious need 
to make certain of the probability parameters, for example 
the fertility probabilities, time variant. There is also a 
need to extend the stratifications on some of these variables. 
The divorce probabilities, for example, could be stratified 
by the characteristics of both spouses, rather than just 
one. And marriage probabilities could be made conditional 
on the education of the spouse. Theoretical analysis can 
‘also be easily extended in the Demographic Block. For example, 
one could develop a more rigorous model of the divorce 
process. Is divorce to be restricted to legal separation or 


should it be extended to include any type of separation? 
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All of the parameters in the Immigration Block are 
time invariant and derived from the data available for the 
107 1 year. In particular, the province-age-sex-marital status 
distribution of new immigrants applies to the year 1971 alone. 
It would clearly be desirable to estimate this distribution, 
from time series data, as a function of (say) economic con- 
ditions in Canada and abroad and possibly other variables as 
. well. It would also be useful to attempt a model which 
predicted the total number of immigrants as a function of 
economic conditions, both in Canada and in the largest 


countries from which immigrants come. 


Several extensions are possible for the Activity 


Block: 


(i) The Class A - Class B distinction could be extended 
or improved by re-defining the classes (on the 
basis of occupation for example), re-estimating 
the transition matrices conditional on class, or 
increasing the number of classes. Data to do this 
exists in the labor force survey, and occupation 
could be added to the state vector because it is 


carried in the SCF survey. 


(ii) The entire approach could be changed. One could 
imagine estimating distributions of weeks employed, 
weeks unemployed, etc., perhaps as functions of 
macro-economic conditions, occupation or class, 
age, sex, region, and so on. The problem here 
would be one of data, since the labor force survey 


is not directly amenable to such an approach. 


ca are oe ye 


| 
nl ener: 


sivése eee ad 
sands ned sone ate eRe a a. 2 
F suka uild Soa otis otnnetee ll core 
a 44 aigdnoes ray. 34 shears a an a 
ew aplite try) iin “lacatog sie bibrat 
Pola febah. a dymesoe as, havea 


i 7 iy 
"0 £04300" S&S oF bel Set cim 13 eae 
‘wiesdt ef (1. Her chant 4 Yihot Sane 
anv “Wanye luty 12 


7 ’ 


ifvienA of: tmi a@idhpacy ere easiehoiee teaser 


i acts fil-ms Shon oid ef Laat »)¢ bent adf 
fe aol eaeevis els coins shes 90 eee 

=% ‘Gate “0 POLS ee an. DO atend ws 

wm ,Raal F (« bii hat aewliteam nob eee oe’ 

tnt) nea simund ect! pinkie rons 

LeiJIagess bin .¢iviup eae? wodelpgety a) -ueles 
si 42" Wives seonmenda nen writ eh Se 

(ove e VG att ct BePeees 


hohe ay uge  fteseveds witdow ont 

hey 5 aniltsectiacar® Gat ow nek pom’ 
Mey 4 atl oo lina etsow = 

ee Lamaze) env27 Phage DRA sa cornate | a 

ome paleese way io ae Gree ee oe ~ 

(S720 Gah uadAl 6Ae e-wile aadak 1S saw ‘of es 


fnavigus se flows: 63 athena vivowsip seer’ wk 


= 202 = 


(iii) The present Markov-Chain model could be extended 
to a 2nd order model, and could perhaps be made 
conditional on occupation or class (See (1) above). 
The data for this exists in the labor force survey. 
The number of conditioning variables (occupation, 
age, sex, region, etc.) is of course limited by 
the size of the survey sample. The present model 
used one method to disaggregate transition matrices 
to additional conditioning variables, and further 


research could also be done in this area. 


There are several extensions that appear to be 


possible in the Market Income Block. 


(i) The income state variables could be extended. It 
would be useful, for example, to distinguish various 
categories of employment income: self employment 
income (non-farm and farm), and wages and salaries. 
This would require a model in which the change in 
one of these income components was made conditional 
on all of the other components (including the 
relationship between employment and investment 
income). The present model for the most part 
assumes that the various income components are 
independent of one another. The DNR longitudinal 
data would be the only source of parameters, but 
there is sufficient data there to carry out the 


necessary estimations. 
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(ii) 


(iii) 
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The income change process itself could be modeled 
as a function of (perhaps) macro-parameters such 
as the rate of inflation, the rate of change of 
GNP, and so on. This would be in addition to the 
recognition that is already made of demographic 
variables and would introduce an implicit time 
variance to the transition matrices. Again, the 


DNR data would probably be sufficient here. 


Because of the obvious relationship between employ- 
ment and income, it would be very desirable to 
eliminate the present general distinction between 
the Activity and Market Income Blocks. To some 
extent these blocks are at present tied together, 
of course, because it is necessary that the employ- 
IMent and income variables be consistent. One 

could improve on this, however, by thinking of 


activity-income as one distinct process, and 


. conceiving of a model that would take account 


simultaneously of all of the activity-income 
variables. The difficulty here is one of data. 
Ideally, one would like to have a longitudinal data 
base that contained both income and activity 
variables. Unfortunately the two best data bases 

- the labor force survey and the DNR file - do 

not fulfill this requirement. The DNR data does 
not contain employment data, and the labor force 
survey does not contain income data. The closest 
approximation to the ideal is perhaps the UIC-DNR 
merge file. This data base suffers, unfortunately, 


from being out of date, and from various weaknesses 
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(iv) 


associated with the UIC data (the activity 
variables for the whole population - school, 

NLF, employment, unemployment - are not covered 
adequately). Perhaps one could construct a new 
data base, linking UIC records, DNR records, and 
some parts of the labor force survey. This would, 


of course, be a complex and costly process. 


The present model was estimated primarily from the 
UIC=-DNR data base. Many parameters could perhaps 
be improved and conditioned on more demographic 
characteristics if the larger DNR data base were 
used. This of course would be much more expensive 
and much more time consuming (due to the difficulty 


of special-request access to the DNR data). 


